Slashdot Mirror


Design, Hardware, Software Errors Doomed Japanese Hitomi Spacecraft (scientificamerican.com)

Reader Required Snark writes: The Japanese space agency JAXA said its recently launched X-Ray observation satellite Hitomi has been destroyed. After a successful launch on February 17, contact with the satellite was lost on March 28. Off the 10-year expected life span, only three days of observations were collected. Preliminary inquiry points to multiple failures in design, hardware and software. After the launch it was discovered that the star tracker stabilization didn't work in a low magnetic flux area over the South Atlantic. When the backup gyroscopic spin stabilization took control, the spin increased instead of stopping. An internal magnetic limit feature in the gyroscope failed, causing the spin get worse. Finally, a thruster based control started, but because of a software failure the spin increased further. The solar panels broke off, leaving the satellite without a long-term power supply. It seems that untested software had been uploaded for thrust control just before the breakup. This is a major loss for astronomical research. Two previous attempts by Japan to launch a high-resolution X-ray calorimeter had also failed, and the next planned sensor of this type is not scheduled until 2028 by the ESA. Just building a replacement unit would take 3 to 5 years and cost $50 million, without the cost of a satellite or launch.

101 comments

  1. Is that all? by Dan+East · · Score: 4, Funny

    Design, Hardware, Software Error

    Oh, is that all?

    --
    Better known as 318230.
    1. Re: Is that all? by Anonymous Coward · · Score: 0

      I was hit by this bug in the 1.1 prerelease too. Reaction wheels would randomly start creating torque out of nowhere and spin your satellite to death. Sad news for JAXA, but they really should have backed up their .craft file.

    2. Re: Is that all? by Anonymous Coward · · Score: 0

      I think this is the same version that North Korea has been trying to use for their ICBMs.

    3. Re:Is that all? by phrostie · · Score: 3, Interesting

      it sounds like everyone is starting from scratch every time a project like this is built.
      regardless of success or fail, wouldn't it be best for everyone to release the engineering and software so that the next one is an improvement over what went before.
      it also might reduce the reduce the life cycle of the next project.

      just my .01999 USD

    4. Re:Is that all? by Anonymous Coward · · Score: 0

      maybe you should take your own advice

    5. Re:Is that all? by Anonymous Coward · · Score: 2, Informative

      On the first launch of the Ariane 5 rocket, it used parts of the control software of the Ariane 4, a very reliable rocket with a success rate of more than 97%. The launch ended with the destruction of the rocket 37 seconds into the flight due to an arithmetic overflow. It had not been taken into account that the bigger rocket would cause bigger values in the control software.

    6. Re:Is that all? by Anonymous Coward · · Score: 0

      The magnetic equivalent of Bermuda Triangle is not for the faint of heart, or with design, hardware and software errors.

    7. Re:Is that all? by Anonymous Coward · · Score: 1

      There have been some attempts to create standardized satellites, a premade box with batteries, gyroscopes, thrusters, solar panels, etc with a cavity inside for mission modules (communications, photography, etc). And they make sense for general uses where you're not trying to do anything too advanced. However with highly advanced/sensitive instruments like in space telescopes its not really practical since each instrument has a litany of things that can throw off its measurements. The James Webb telescope for example (if they can ever get it launched) has a complicated design where its instruments are all on the other side of a infrared umbrella from its control/power systems to prevent them (and the sun) from throwing off the readings. In more contemporary space telescopes gyroscopes, batteries, etc probably have to be specially placed to prevent them from shaking the camera, or heat from them warping the instruments on a nanometer level. You might be able to standardize some of the components, maybe even some of the control systems, but each satellite will probably require significant reconfiguration and calibration to make sure that the satellite itself doesn't throw off the science it hopes to achieve.

    8. Re:Is that all? by Tablizer · · Score: 1

      Bad art

    9. Re:Is that all? by EmperorOfCanada · · Score: 1

      I would have thought that most of this would be plug and play by now. Not so much that every component is the same, but that how the components interface would be the same. Much like when I hook a new more sensitive mouse, or a better printer to my computer, I don't need to reconfigure my browser. There can't be that many unique systems on any given platform. Gyros, rockets, sensors, etc. Maybe today's gyro package is way better than yesterday's but I would think that you just make the interface capable of handling an insanely great gyro and then don't worry about it for many many generations of gyros.

      I would also have thought that there would be simulators for most of this crap. Things that would give a nice broadspectrum test of a system with various temperatures, magnetic flux, radiation, etc. Something where they could test just about every strange scenario that space throws at a system over its lifetime. But in spades.

      Also by standardizing some of this crap, they could also effectively opensource their solutions. It seems fairly common to hear heroic stories where a system will have 5 gyros with a minimum of 3 required to operate. But then 4 of them fail and the geniuses figure out how to keep the system limping along on that single gyro. That is a perfect but of software to then just incorporate into all future systems.

      Lastly why isn't there a "your commands are stupid, let's wait for manual control" edge case analyser. Things like spinning too fast, very long burns on the rockets, etc. If the computer decides to do one of these unusual things, that instead it yells to ground control, "Hey, could you double check to make sure that I don't have a case of the stupids."

    10. Re:Is that all? by Eunuchswear · · Score: 1

      You don't know much about junkies if you think they want anyone to suck their dicks. They're not into getting high with natural brain chemistry.

      Junkies suck dicks, nobody sucks junkie dicks.

      --
      Watch this Heartland Institute video
    11. Re:Is that all? by Eunuchswear · · Score: 2

      On the first launch of the Ariane 5 rocket, it used parts of the control software of the Ariane 4, a very reliable rocket with a success rate of more than 97%. The launch ended with the destruction of the rocket 37 seconds into the flight due to an arithmetic overflow. It had not been taken into account that the bigger rocket would cause bigger values in the control software.

      It was great, they used software coded in ADA that detected the overflow and raised an exception, disabling the faulty part, the work was then taken over by the backup system which, being identical, did exactly the same thing. Whoops.

      --
      Watch this Heartland Institute video
    12. Re:Is that all? by Eunuchswear · · Score: 2

      I would also have thought that there would be simulators for most of this crap.

      A KSP add-on?

      --
      Watch this Heartland Institute video
    13. Re:Is that all? by laing · · Score: 1

      The main issue with Ariane 5 was the in-band error code being sent from the accelerometer when the force during liftoff exceeded its specification. The in-band error code was interpreted by the flight control system as a valid data value. Things went very wrong from then on.

    14. Re:Is that all? by Methadras · · Score: 1

      Space is hard. For the japs moreso. Some nations should just stop bothering with their own space programs. There is nothing they can glean that they couldn't be doing a collaborative effort with countries that do space well.

  2. 3 days of data? by JustAnotherOldGuy · · Score: 3, Insightful

    Only got 3 days of data? Damn, that's gotta hurt.

    Also, the "Design, Hardware, Software Error" bit is funny in a way...I mean, what else was left to screw up? This was like the Trifecta of Fuckups.

    --
    Just cruising through this digital world at 33 1/3 rpm...
    1. Re:3 days of data? by Anonymous Coward · · Score: 0

      "I mean, what else was left to screw up?"
      PR

    2. Re:3 days of data? by Anonymous Coward · · Score: 1

      They could have given it bad instructions (i.e. user error).

    3. Re:3 days of data? by mrbester · · Score: 1

      Flying over the South Atlantic Anomaly. It's not like it wasn't known it is there and causes issues that should be tested for before launch.

      --
      "Wait. Something's happening. It's opening up! My God, it's full of apricots!"
    4. Re:3 days of data? by CODiNE · · Score: 1

      A Whathefecta!

      --
      Cwm, fjord-bank glyphs vext quiz
    5. Re:3 days of data? by Anonymous Coward · · Score: 0

      Operator Error?

  3. This would be an excellent time for them to post by Johnny+Loves+Linux · · Score: 1

    on Reddit's TIFU: https://www.reddit.com/r/tifu/

  4. It's only when the backups kick in... by Velox_SwiftFox · · Score: 1

    ... that you find they were wired backwards.

    1. Re:It's only when the backups kick in... by mrbester · · Score: 2

      If only. All they'd need to do in that case is "reverse the polarity"

      --
      "Wait. Something's happening. It's opening up! My God, it's full of apricots!"
  5. Schadenfreude! by Anonymous Coward · · Score: 0

    Subject says it all.

  6. Software uploaded before breakup. by fahrbot-bot · · Score: 3, Funny

    It seems that untested software had been uploaded for thrust control just before the breakup.

    See what happens when you don't disable the GWX settings.

    --
    It must have been something you assimilated. . . .
  7. I feel sorry for that guy... by Anonymous Coward · · Score: 5, Informative

    From the TFA

    Dan McCammon, an astronomer at the University of Wisconsin–Madison, helped to design and build Hitomi’s premiere scientific instrument, an X-ray calorimeter that measures the energy of X-ray photons with exquisite precision. He has been working on the technology for more than three decades, flying versions of it on the ASTRO-E mission, which failed on launch in 2000, and the Suzaku spacecraft, in which a helium leak rendered the instrument useless weeks after its 2005 launch.

  8. oops by Anonymous Coward · · Score: 0

    Does anybody else think that their insurance company may not pay out ?

  9. Suggestion to JAXA by Anonymous Coward · · Score: 1

    Re-appoint your entire senior software team, especially the lead. Examine the engineering background of the rest.

    Hardware fails, that's completely inevitable. Software of the kind we're talking about is meant to limit the impact of independent hardware failures, which it can do because its own failure modes can be given however many fractional 9's of perfect reliability you desire, limited only by available resources.

    From the reports, it seems clear that the probe's software was not designed to do that, and the failures of process which started off the event were also not designed using defensive and self-corrective principles.

    In other words, this was entirely a people problem, a failure caused by using system software designers who lack the engineering mindset and extreme cautions needed when handling systems of this kind.

    The poor software should actually have been caught in an external design audit in advance of launch, and in simulation. Investigate why it wasn't, and you'll probably find yet another people problem.

    1. Re:Suggestion to JAXA by joe_frisch · · Score: 2

      You don't want to knee-jerk it. Who approved the upload of untested software and why. There could be a valid reason - say a fatal bug discovered in the existing code and no way to change the launch schedule. It could be budget pressure - simply not enough money to test. It could be plain incompetence.

    2. Re:Suggestion to JAXA by Anonymous Coward · · Score: 0

      Don't stop with the software team, keep going up the ladder. Management up through the highest level needs to be closely scrutinized and cleansed of incompetence and political susceptability.

    3. Re:Suggestion to JAXA by GrumpySteen · · Score: 1

      Fuck it. Just kill off the entire human race and start over.

    4. Re:Suggestion to JAXA by sysrammer · · Score: 1

      Fuck it. Just kill off the entire human race and start over.

      Tried that once. Didn't work.

      --
      His ignorance covered the whole earth like a blanket, and there was hardly a hole in it anywhere. - Mark Twain
  10. Open source satellite software? by Midnight+Thunder · · Score: 1

    If the satellite is being designed and built by a government organisation, in the name of the advancement of human knowledge, should we be encouraging the software to be open source? Have there been examples of such initiatives?

    --
    Jumpstart the tartan drive.
    1. Re:Open source satellite software? by Anonymous Coward · · Score: 0

      It was probably running Linux, first mistake.

    2. Re:Open source satellite software? by tomhath · · Score: 1

      Some is available. But keep in mind that "civilian" space programs are usually thinly disguised military projects, so much of what's really happening is not made public.

    3. Re:Open source satellite software? by Anonymous Coward · · Score: 0

      Yes.. I work on NASAs open source flight software:
      http://cfs.gsfc.nasa.gov

      It also runs on an open source real time OS: RTEMS
      http://www.rtems.org

    4. Re:Open source satellite software? by jc42 · · Score: 3, Interesting

      It was probably running Linux, first mistake.

      Nah; it was probably running ITRON. It may well have included a POSIX library, but that wouldn't qualify it as a version of linux, even if some linux code is included there.

      I haven't actually bothered to dig up the info, but that's what anyone acquainted with how such things are done in Japan would guess for a situation with serious RT requirements. Maybe it'd be interesting to investigate, to get an idea whether the OS and system libraries might have had anything to do with the failures.

      --
      Those who do study history are doomed to stand helplessly by while everyone else repeats it.
    5. Re:Open source satellite software? by Anonymous Coward · · Score: 0

      Open source works by failing quickly and failing often....

    6. Re:Open source satellite software? by Anonymous Coward · · Score: 0

      Thanks for the links (also, nice job!). Just FYI, I'm getting an invalid cert on cfs.gsfc.nasa.gov (I use HTTPS Everywhere):

      cfs.gsfc.nasa.gov uses an invalid security certificate.

      The certificate is not trusted because the issuer certificate is unknown.

      (Error code: sec_error_unknown_issuer)

    7. Re:Open source satellite software? by Midnight+Thunder · · Score: 1

      Just looked and the CA is the US government and it is valid until 2018. This is likely valid, just not a certificate authority most browsers have by default?

      --
      Jumpstart the tartan drive.
    8. Re:Open source satellite software? by Midnight+Thunder · · Score: 1

      Some is available. But keep in mind that "civilian" space programs are usually thinly disguised military projects, so much of what's really happening is not made public.

      Thanks for the link. What you say makes sense, though I though I would ask anyhow, since there is likely a shift between what is considered knowledge limited to military use?

      --
      Jumpstart the tartan drive.
    9. Re:Open source satellite software? by Anonymous Coward · · Score: 0

      It was probably running Linux, first mistake.

      It was running Windows 8, which absolutely explains everything. Doesn't it. MS promised them 2 free upgrades to Windows 10, so they couldn't refuse.

  11. But don't worry by Anonymous Coward · · Score: 0

    We're totally colonizing the universe any day now.

  12. Re:Good by Midnight+Thunder · · Score: 2

    Space is dead. It's a radiation-blasted vacuum. Nobody is going to live there. Ever. Get over it, Space Nutters. We should kill all astrophysicists and burn all scifi books. Like in Europe.

    Europe got bored of that and the sport is now found elsewhere in the world. I for one welcome space nutters, since they give us something else to talk about :) I would burn the trolls, but not considering myself a violent person will accepting making a sport of them.

    --
    Jumpstart the tartan drive.
  13. Those are not software and hardware errors -- by vmaxxxed · · Score: 5, Interesting

    Those are called political and budget pressure by managers who have no clue on engineering ---

    Software uploaded with out testing ? There is no way they could have gotten this far with out testing. I am sure there is no engineer in Japan that does not test thoroughly. Actually Japanese code is famous for being of the best quality -

    This was caused by politics, bureaucracy and plain bad management.

    1. Re:Those are not software and hardware errors -- by gweihir · · Score: 2

      Indeed. And very likely by a culture of "not contradicting the boss". An engineer that is unwilling to "contradict the boss" is a bad engineer, no matter what other skills he has. Of course, many bosses simply get rid of the "naysayers" and foster a culture of "can do". The results are invariably what we see in this story, although many managers manage to conceal that they were responsible for quite a while and sometimes forever. If the damage is huge, it is very rarely the engineers that have screwed up.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    2. Re:Those are not software and hardware errors -- by rubycodez · · Score: 1

      Doc: "No wonder this circuit failed; it says 'Made in Japan" --Back to the Future

    3. Re:Those are not software and hardware errors -- by Viol8 · · Score: 1

      >An engineer that is unwilling to "contradict the boss" is a bad engineer,
      >no matter what other skills he has. Of course, many bosses simply get
      >rid of the "naysayers"

      And there's your problem right there. What would you rather be - a "bad" engineer who can still pay the mortgage/rent, or a righteous engineer who's now looking for work and could be on the street in a few months if doesn't get a new job?

    4. Re:Those are not software and hardware errors -- by Anonymous Coward · · Score: 0

      >> there is no engineer in Japan that does not test thoroughly
      You should google and read "the single bit flip that killed" about the post-mortem on the software in toyota cars. Japanese engineers are subject to the same schedule pressures and shortcuts as anyone else

    5. Re:Those are not software and hardware errors -- by gweihir · · Score: 2

      I most certainly do not want to be the engineer responsible for a spectacular failure. Of course, the software field has far too many "engineers" and many of them bad in other ways, which makes the problem worse. But while I work on a level where I cannot only speak up, it is required that I speak up, I can understand the person that decides to keep quiet.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    6. Re:Those are not software and hardware errors -- by Plus1Entropy · · Score: 1

      What would you rather be

      Without even a second of hesitation, the latter. I live in Canada, so there's no at-will or right-to-work or whatever the hell it's called (don't know the difference or care), so good luck firing me for trying to do the right thing. Especially since we have professional organizations backing us up, the company has a hell of a lot more to lose than I do.

      Even if that wasn't the case, my answer doesn't change. If I just wanted to make money slaving under someone else's will with no creative say in my work, damning the consequences, there's way better options than engineering.

      --
      Only crack the nuts that crack. You don't put the ones that don't crack in the sack.
    7. Re: Those are not software and hardware errors -- by Anonymous Coward · · Score: 0

      No, not exactly. It doesn't quite work that way here, decision process is highly structured, but it is not top down quite that way.

      OTOH, 'must not miss the deadline' and loose face is a big thing, and it's both collective and self imposed...

    8. Re:Those are not software and hardware errors -- by Anonymous Coward · · Score: 0

      Yes, but they're better because they're Japanese. Everything Japanese is better - AmiMoJo says so.

    9. Re:Those are not software and hardware errors -- by drinkypoo · · Score: 1

      Indeed. And very likely by a culture of "not contradicting the boss". An engineer that is unwilling to "contradict the boss" is a bad engineer, no matter what other skills he has.

      You're supposed to raise the issue after work, over drinks. Yes, I would also prefer to have time for my own personal life, than have to go drinking after work in order to continue working, and do the stuff you should have been able to do at work but couldn't because of societal inertia and corporate culture.

      If I weren't so concerned with what is happening here in the USA regarding labor, I'd be really and truly fascinated by it in Japan. They have a culture of make-work now, and a massive suicide rate. We have a culture of unemployment... and the suicide rate's rising. The comparative solutions to the problems of the unemployed should be fascinating. Last time Japan had excess population, they sent them to Brazil. Then, when they had a birth rate problem and needed population for labor, they reabsorbed many of them but now they treat the mixed like an underclass. Sadly, we're doing even worse, we are just pretending that there is no problem. How long can that last?

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    10. Re: Those are not software and hardware errors -- by gweihir · · Score: 1

      Interesting. The two you mention are killers as well. Makes sense to me.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    11. Re:Those are not software and hardware errors -- by cwsumner · · Score: 1

      ... And there's your problem right there. What would you rather be - a "bad" engineer who can still pay the mortgage/rent, or a righteous engineer who's now looking for work and could be on the street in a few months if doesn't get a new job?

      It is better to be fired, or quit. Then you will not be one of the ones black-balled by all of the other personel departments, after the disaster.

      But it is a value judgement that must be made by all of us, based on the potential damage that might happen and the odds.

    12. Re:Those are not software and hardware errors -- by sysrammer · · Score: 1

      Doc: "No wonder this circuit failed; it says 'Made in Japan" --Back to the Future

      Yeah. I noticed that too. A good laugh. 'Member when "made in China" only meant McMickey toys? I was thinking at the time that they'd follow the same arc as Japan.

      --
      His ignorance covered the whole earth like a blanket, and there was hardly a hole in it anywhere. - Mark Twain
  14. Re:This would be an excellent time for them to pos by Anonymous Coward · · Score: 0

    Why would it be an excellent time? None of the fuck-up dates from TFA are 30 April.

  15. Foolish Project Managers by Anonymous Coward · · Score: 0

    Been on many software projects. From experience, it sounds like a project I am on now. You can't upload untested software on the fly if you want something to work. Its hard to convince non engineers of this but perhaps this well teach someone... I make sure I protest every time I am asked to release untested software to a production environment so they know this is really a bad decision when they follow the paper trail and the developers did everything they could to stop it from happening ahead of time.

  16. Re:Good by Anonymous Coward · · Score: 0

    First part is 100% correct. I don't know how you arrived at that conclusion though.

  17. Dating an Engineer by fahrbot-bot · · Score: 5, Funny

    It seems that untested software had been uploaded for thrust control just before the breakup.

    Note to self: Don't ask your girlfriend questions you don't want the answers to - again.

    --
    It must have been something you assimilated. . . .
  18. Cretinization of engineering by gweihir · · Score: 2

    This is just one of the more spectacular examples. I have heard of managers of large software teams that "do not believe in testing", I have seen Internet-reachable critical software that got a security evaluation only after deployment, because it was finished only a few days before deployment, and quite a few more things of similar utter incompetence. My guess is that the people responsible for these completely ridiculous screwups are "managers" that think they know how it all works (while being clueless), and that have eliminated all resistance to their views by firing anybody actually competent.

    This is a dangerous and completely unacceptable regression. Humanity needs to be good at engineering if it is to have a future.

    --
    Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    1. Re:Cretinization of engineering by Anonymous Coward · · Score: 0

      That is true but at the same time, just write software that isn't broken. Something that engineers are very bad at in general.

      For example, it's impossible for me to ever release software that is "untested". Whether official process or not, my shit is tested by me and I'm very good at breaking software. So much so that I often have to write more code than I should because I find bugs in even well established and supposedly robust debugged libraries. I mean literally every single library I have ever used, I have found bugs that break them. This is why my stuff is so good, because I punish software in ways normal people can't imagine.

    2. Re:Cretinization of engineering by Wiener · · Score: 1

      Humanity needs to be good at engineering if it is to have a future.

      So...get rid of management? Because the two are mutually exclusive.

    3. Re:Cretinization of engineering by cwsumner · · Score: 1

      ... So...get rid of management? Because the two are mutually exclusive.

      No, just the "pointy-haired managers", who are not actually managers at all!

      Real managers are necessary and helpful.

  19. Reminds me of a recent jdrama by execthis · · Score: 1

    Interesting I just watched a true-story-based jdrama about the development of rockets in Japan called "Shitamachi Rocket" which blew my mind.

  20. I was going to ask... by Anonymous Coward · · Score: 0

    Does this mean we should readjust our opinions on Japanese technology back to the 'only make shit' opinion people had prior to the... 60s?

  21. Why software blamed first? by Anonymous Coward · · Score: 0

    " star tracker experienced glitches whenever it passed over the eastern coast of South America"

    "Somewhere along the way, the problems with the star tracker caused Hitomi to rely instead on another method, a set of gyroscopes, to calculate its orientation in space. But those gyroscopes were reporting, erroneously, that the spacecraft was rotating at a rate of about 20 degrees each hour. Tiny motors known as reaction wheels began to turn to counteract the supposed rotation."

    So let me get this straight...2 major pieces of hardware failed to perform as designed yet the third leg of this stool, the control software, was identified as the cause of this catastrophe.

    to be sure there is no excuse for untested software in any production system let alone one so critical. However, I find the articles headline, and analysis, dubious when there were other design (hardware) based flaws, that if caught earlier, would have prevented this tragedy.

    All Engineering disciplines have design flaws that escape into "production." It's only software design flaws that get everyone so breathless. Why?

    I guessing only (because the article's information content sucks) that what was actually untested was this particular chain of events. My professional guess is that the software was tested except for this failure edge case. You can't test Software forever. You have to ship it at some point, and you always ship it with a set of known presumptions.

    1. Re:Why software blamed first? by Anonymous Coward · · Score: 0

      It could have been the software in the gyros that was causing them to report erroneously. Maybe something like an incorrect, or more likely, incompatibly formatted message. Eg, gyro sends position in SAF17 and the control software expects SAF18.

      Source: former engineer of (probably) similar gyroscope package.

  22. Sheisse! :-( by Anonymous Coward · · Score: 0

    Sad moment for stellar science...

  23. Root cause analysis? by drwho · · Score: 3, Insightful

    I'd like to see a more thorough investigation of this set of incidents. That means no one involved gets to skip out by Seppuku. One of the problems with having a number of backup systems is that people tend to think "well, if it breaks, there's a backup system" - not realizing that each time a backup system is added, complexity is added, and that overall reliability goes down, instead of up. I don't know if over-reliance of backup systems, and failure to manage complexity, was the cause here, but it's the only thing other than "bad luck" or "sabotage" that can explain this disaster from a country which has many talented engineers.

    1. Re:Root cause analysis? by Anonymous Coward · · Score: 0

      It's inaccurate to say that merely adding a backup system causes overall reliability to go down. That's a possible outcome, but whether or not it is true for a given system very much depends on how the original system worked and what the failure modes are for both the original system and the backup. For example, giving your spouse a copy of the key for your car does not make the original key less reliable in any statistically significant way and does provide some additional recovery options if your key becomes unusable for some reason.

    2. Re:Root cause analysis? by Plus1Entropy · · Score: 1

      Not exactly your point, but in the same vein... Your comment on backup systems reminds me of a common misconception when it comes to designing seals with O-ring gaskets.

      I've heard many times: "Well, it almost seals, so if we just put a backup gasket in there it will be fine." Any O-ring design guideline will tell you that adding a backup only allows you to loosen your machining tolerances a bit; e.g. if the groove had to be X +/-0.005" deep, now it can be X +/-0.010" instead. X still has to be the same, the gasket still has to be the correct size and material, and the maximum operating pressure is still the same as before.

      It really is amazing that people think redundancy automatically means it's twice as reliable. I guess it's "intuitive" to a lay-person, but the reason we have experts is because things are often not (or even counter) intuitive.

      --
      Only crack the nuts that crack. You don't put the ones that don't crack in the sack.
    3. Re:Root cause analysis? by Plus1Entropy · · Score: 1

      Not saying you're wrong, necessarily, but to the GP's point, there is added complexity to the system. While you say that the second key doesn't affect your key working, it does if you need the car and your spouse happens to be using it. Also, there's a second set of keys to keep track of, which could be lost or stolen and subsequently used to steal the car. So redundancy has added a new failure mode and increased the likelihood of an existing one.

      --
      Only crack the nuts that crack. You don't put the ones that don't crack in the sack.
    4. Re:Root cause analysis? by sysrammer · · Score: 1

      Yeah. I'd say that I've seen outages caused by backups perhaps every 2-4 years. From resource limitations to outright offline for the users.

      But most software is not "critical". Only moderate efforts, if even that much, are made for redundancy, so the failures from the backups (as opposed to failures *of* the backups) tend to be taken in stride.

      Anyways, I imagine it takes at least an order of magnitude greater effort for critical software.

      --
      His ignorance covered the whole earth like a blanket, and there was hardly a hole in it anywhere. - Mark Twain
  24. IBM 9000 by drwho · · Score: 4, Funny

    "Well, I don’t think there is any question about it. It can only be attributable to human error. This sort of thing has cropped up before, and it has always been due to human error."

  25. Compiles in the lab... Ship IT! by Anonymous Coward · · Score: 0

    FastWorks at its finest. Wonder how many executives got their bonus because it launched?

  26. Re:Good by Anonymous Coward · · Score: 0

    I would kill all the trolls. It's the right thing to do. Gandalf told me so.

  27. Firing the wrong thruster is "an edge case"?? by Viol8 · · Score: 1

    I'd have thought for a spacecraft control system it would be one of the first pieces of code you'd test! Its equivalent to putting a car into Drive and finding yourself going backwards!

  28. obligatory "The Arrival." It's alien sabotage by mrflash818 · · Score: 1

    "Ask yourself why an antenna won't deploy on a deep space probe."

    "Or ask how they could launch a $6Billion telescope without testing its mirror."

    'The Arrival'

    https://www.youtube.com/watch?...

    --
    Uh, Linux geek since 1999.
  29. What to steer clear of? by transami · · Score: 1

    So... what *modern* development methodology and platform did they use?

    --
    :T:R:A:N:S:
  30. Re:This would be an excellent time for them to pos by Plus1Entropy · · Score: 1

    Lol, the T in TIFU is really more of a guideline, in that it's OK to completely ignore it. Same goes for the I.

    TIFU is really more like TITOASWIOSEFU: Today I Thought Of A Story Where I or Someone Else Fucked Up. Not quite as catchy.

    --
    Only crack the nuts that crack. You don't put the ones that don't crack in the sack.
  31. No kidding by Anonymous Coward · · Score: 0

    What the heck is happening to Japan? You'd think they'd have better management and knowhow than this, but I guess their standards are slipping severely.

    1. Re:No kidding by Anonymous Coward · · Score: 0

      They've never been good at space tech - countless lost missions.

  32. They Did!! by Anonymous Coward · · Score: 0

    " It seems that untested software had been uploaded for thrust control just before the breakup"

  33. "probably running Linux"!? by Anonymous Coward · · Score: 0

    Microsoft shill alert! A$$hole in the room, and it is not me!

    1. Re:"probably running Linux"!? by Anonymous Coward · · Score: 0

      Microsoft shill alert! A$$hole in the room, and it is not me!

      I guess he's in good company.

  34. Re:Good by Eunuchswear · · Score: 1

    "cos Arianespace is totaly not a thing, ESA was closed down years ago and Darmstadt is only known for its football team.

    --
    Watch this Heartland Institute video
  35. Asian grit and determination by Anonymous Coward · · Score: 0

    We all know countries in the far east have great number crunchers who work hard, but space, aeronautics, the large scale stuff needs more. I worry that a future of space dominated by Asia will lead to a big orbital debris field, much like how the Pacific is becoming a big nuclear waste and refuse pool thanks to Japan. Leave it to the big boys in Europe and North America. It takes more (a lot more) than just crunching the numbers. And xenophobic, proud, stubborn inadequate people in Japan doesn't help the situation.

  36. tradeoffs in flight software by Anonymous Coward · · Score: 1

    Flight software developer here (I *am* a rocket engineer, of sorts)

    Spacecraft stuff is not made in sufficient volumes to have standardized interfaces beyond the basic electrical interface. Sure, there's a 24V DC power, perhaps a few discretes, some discrete telemetry (voltage, current, temperature), and some kind of data interface (MIL-STD-1553, RS-422 serial, or SpaceWire are most likely).
    So you will be writing some custom software to deal with this almost one-of-a-kind interface.

    Typically, you are inheriting control code from some previous spacecraft, as well - flight software is expensive, the stuff you have to do is pretty much the same every time, there's a good case for re-use. And, perhaps not all the corner cases of that inherited software. Or perhaps your little shim layer that sits between old code and new device and translates "new device format" data into "old device format" data has some issues.

    Flight software typically doesn't have lots of extra capability: you have to test it over the entire range, so it tends to be "do we have a specific requirement for that? Yes: build it and test it; No, it's nice to have: Don't build it" So your idea of "incorporate lots of flexibility against potential future devices" would be a non-starter: what requirement would you design against for that "potential future device"? How would you justify that particular requirement, as opposed to another? Say your existing software MUST handle 100 byte messages from the reaction wheel controller. and you want to say "why don't we code it for 1000 bytes to make room for expansion?".. that extra space comes at a cost: memory costs money, testing to 1000 costs more than testing to 100. And ultimately, someone will say "well, why not 2000? or 500?" - unless there's some natural "breakpoint" in the cost function, there's no good rationale.

    And with any sort of self checking, you have to trade off the failure probability of the self checker.

    1. Re:tradeoffs in flight software by EmperorOfCanada · · Score: 1

      I would suspect that this tradeoff would apply to every single project, but not all the projects overall. The above screwup cost over $200 million. Thus the savings from preventing a single screwup out of even 20 projects would more than cover the extra costs.

      This is similar to an argument I often have about unit testing. Many programmers are still opposed to the idea. I actually believe that a large project without unit testing can't actually be competed. The extra effort of unit testing actually allows for ever faster progress the deeper you get into the project as compared to the same project without unit testing that often becomes stalled at a certain point.

      So I would somewhat think that even though launches and space vehicles aren't quite a commodity product, that by not standardizing that it is decreasing product quality while limiting what can be developed.

      I would think that if the bulk of a space probe could be borderline off the shelf and if they weren't quite reinventing the wheel or at least rebuilding someone else's wheel from scratch, that more effort could be put into the final mission.

      Also this would presumably decrease risk which by itself reduces cost, and would just reduce cost which either means more money for the actual mission, or more missions because of lowered costs.

    2. Re:tradeoffs in flight software by tlhIngan · · Score: 1

      Flight software typically doesn't have lots of extra capability: you have to test it over the entire range, so it tends to be "do we have a specific requirement for that? Yes: build it and test it; No, it's nice to have: Don't build it" So your idea of "incorporate lots of flexibility against potential future devices" would be a non-starter: what requirement would you design against for that "potential future device"? How would you justify that particular requirement, as opposed to another? Say your existing software MUST handle 100 byte messages from the reaction wheel controller. and you want to say "why don't we code it for 1000 bytes to make room for expansion?".. that extra space comes at a cost: memory costs money, testing to 1000 costs more than testing to 100. And ultimately, someone will say "well, why not 2000? or 500?" - unless there's some natural "breakpoint" in the cost function, there's no good rationale.

      You missed out the extra testing required.

      Today your gyro takes 100 bytes. And you build your code for tomorrow's 1000 byte gyro control. Crap, you need to test your code with both kinds to make sure it handles that larger buffer even though it never will occur on the current platform. All that extra testing and all that for essentially useless code drives up costs. For flight-safety software, it already costs a ton of money to provably be correct, and now it has to be provably correct for the hardware that exists now, and correct for hardware that doesn't exist now and will not exist when the software is sent up.

      At best, you can write your code so constants and all that are reasonably self-contained so your 100 byte buffer code can be re-used with 1000 bytes later on.

  37. 50 million and 5 years? by Anonymous Coward · · Score: 0

    Not that bad? Half the cost and probably half the time than developing Star Citizen.

  38. Summary Error, Article Error by laing · · Score: 1
    I know nothing about the specifics of this mission, but I do know something about spacecraft.

    The summary says the star tracker didn't work in "an area of low magnetic flux" (the South Atlantic Anomaly). The true issue is that the SAA is a high radiation area and the radiation caused an SEU in the star tracker. The Scientific American article was a bit mixed up about dumping the momentum stored in the reaction wheels. The text is a bit jumbled, but I believe the article was referring to magnetic torque rods which produce a force vs. Earth's magnetic field, but they only work if the spacecraft is stable. The spacecraft was never stable because the IRU (gyroscopes) provided erroneous information. In the end, the ACS issue (probably a sign error) is what killed the spacecraft.

  39. Well, gee how did they get here... by Anonymous Coward · · Score: 0

    http://spaceflight101.com/h-iia-astro-h/hitomi-failure-chain/

    1. They launch the spacecraft.
    2. They find out that Star tracker stabilization doesn't work in an area that is also a communications blackout.
    3. They upload a patch, continuue deployment of the boom and start a reorientation maneuver (for the next image target), as they go into blackout.
    4. The maneuver completes when the satellite is in the high radiation region that is problematic.
    5. The IRU (inertial guidance) error is high presumably because of the maneuver, the Star tracker data that is supposed to override and correct it is invalid. The temporal integration algorithm that is supposed to correct the error (21.7 degree-per-hour roll) doesn't have time to work because the IRU takes action.
    6. Reaction wheels spin the spacecraft based on erroneous data. There was a limiter issue but that doesn't seem to materially matter.
    7. Because the reaction wheels are near saturation it enters safe mode.
    8. A sun sighting that is supposed to provide correct attitude doesn't.
    9. The thrusters fire without solar information (presumably using the flawed IRU information since that is the only information available). Now the OP thread article says because of software error, but the only error seems to be no source to correct the IRU information. Presumably the thrusters were trying to finish cancelling the non-existent spin the reaction wheels couldn't.

    A couple of questions:
    1. Once they found out there was a Star Tracker problem, why did they continue business as usual?
    2. Why start a reorientation maneuver on a spacecraft with orientation problems going into a blackout region that is problematic for Star Tracker.particularly with a new software patch?
    3. Can the Star Tracker even function with a high rate of spin, given the solar sensor couldn't?
    4. Why wasn't the IRU information more accurate and why wasn't it able to correct a low but false spin rate? IE the worst case should be the spacecraft rotating at 21.7 degrees per hour.

    It seems to me that spacecraft stabilization is pretty important. It doesn't seem it was that important to the ground support staff.before the incident.