Design, Hardware, Software Errors Doomed Japanese Hitomi Spacecraft (scientificamerican.com)
Reader Required Snark writes: The Japanese space agency JAXA said its recently launched X-Ray observation satellite Hitomi has been destroyed. After a successful launch on February 17, contact with the satellite was lost on March 28. Off the 10-year expected life span, only three days of observations were collected. Preliminary inquiry points to multiple failures in design, hardware and software. After the launch it was discovered that the star tracker stabilization didn't work in a low magnetic flux area over the South Atlantic. When the backup gyroscopic spin stabilization took control, the spin increased instead of stopping. An internal magnetic limit feature in the gyroscope failed, causing the spin get worse. Finally, a thruster based control started, but because of a software failure the spin increased further. The solar panels broke off, leaving the satellite without a long-term power supply. It seems that untested software had been uploaded for thrust control just before the breakup. This is a major loss for astronomical research. Two previous attempts by Japan to launch a high-resolution X-ray calorimeter had also failed, and the next planned sensor of this type is not scheduled until 2028 by the ESA. Just building a replacement unit would take 3 to 5 years and cost $50 million, without the cost of a satellite or launch.
Design, Hardware, Software Error
Oh, is that all?
Better known as 318230.
Only got 3 days of data? Damn, that's gotta hurt.
Also, the "Design, Hardware, Software Error" bit is funny in a way...I mean, what else was left to screw up? This was like the Trifecta of Fuckups.
Just cruising through this digital world at 33 1/3 rpm...
on Reddit's TIFU: https://www.reddit.com/r/tifu/
... that you find they were wired backwards.
It seems that untested software had been uploaded for thrust control just before the breakup.
See what happens when you don't disable the GWX settings.
It must have been something you assimilated. . . .
From the TFA
Dan McCammon, an astronomer at the University of Wisconsin–Madison, helped to design and build Hitomi’s premiere scientific instrument, an X-ray calorimeter that measures the energy of X-ray photons with exquisite precision. He has been working on the technology for more than three decades, flying versions of it on the ASTRO-E mission, which failed on launch in 2000, and the Suzaku spacecraft, in which a helium leak rendered the instrument useless weeks after its 2005 launch.
Re-appoint your entire senior software team, especially the lead. Examine the engineering background of the rest.
Hardware fails, that's completely inevitable. Software of the kind we're talking about is meant to limit the impact of independent hardware failures, which it can do because its own failure modes can be given however many fractional 9's of perfect reliability you desire, limited only by available resources.
From the reports, it seems clear that the probe's software was not designed to do that, and the failures of process which started off the event were also not designed using defensive and self-corrective principles.
In other words, this was entirely a people problem, a failure caused by using system software designers who lack the engineering mindset and extreme cautions needed when handling systems of this kind.
The poor software should actually have been caught in an external design audit in advance of launch, and in simulation. Investigate why it wasn't, and you'll probably find yet another people problem.
If the satellite is being designed and built by a government organisation, in the name of the advancement of human knowledge, should we be encouraging the software to be open source? Have there been examples of such initiatives?
Jumpstart the tartan drive.
Space is dead. It's a radiation-blasted vacuum. Nobody is going to live there. Ever. Get over it, Space Nutters. We should kill all astrophysicists and burn all scifi books. Like in Europe.
Europe got bored of that and the sport is now found elsewhere in the world. I for one welcome space nutters, since they give us something else to talk about :) I would burn the trolls, but not considering myself a violent person will accepting making a sport of them.
Jumpstart the tartan drive.
Those are called political and budget pressure by managers who have no clue on engineering ---
Software uploaded with out testing ? There is no way they could have gotten this far with out testing. I am sure there is no engineer in Japan that does not test thoroughly. Actually Japanese code is famous for being of the best quality -
This was caused by politics, bureaucracy and plain bad management.
It seems that untested software had been uploaded for thrust control just before the breakup.
Note to self: Don't ask your girlfriend questions you don't want the answers to - again.
It must have been something you assimilated. . . .
This is just one of the more spectacular examples. I have heard of managers of large software teams that "do not believe in testing", I have seen Internet-reachable critical software that got a security evaluation only after deployment, because it was finished only a few days before deployment, and quite a few more things of similar utter incompetence. My guess is that the people responsible for these completely ridiculous screwups are "managers" that think they know how it all works (while being clueless), and that have eliminated all resistance to their views by firing anybody actually competent.
This is a dangerous and completely unacceptable regression. Humanity needs to be good at engineering if it is to have a future.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
Interesting I just watched a true-story-based jdrama about the development of rockets in Japan called "Shitamachi Rocket" which blew my mind.
I'd like to see a more thorough investigation of this set of incidents. That means no one involved gets to skip out by Seppuku. One of the problems with having a number of backup systems is that people tend to think "well, if it breaks, there's a backup system" - not realizing that each time a backup system is added, complexity is added, and that overall reliability goes down, instead of up. I don't know if over-reliance of backup systems, and failure to manage complexity, was the cause here, but it's the only thing other than "bad luck" or "sabotage" that can explain this disaster from a country which has many talented engineers.
"Well, I don’t think there is any question about it. It can only be attributable to human error. This sort of thing has cropped up before, and it has always been due to human error."
I'd have thought for a spacecraft control system it would be one of the first pieces of code you'd test! Its equivalent to putting a car into Drive and finding yourself going backwards!
"Ask yourself why an antenna won't deploy on a deep space probe."
"Or ask how they could launch a $6Billion telescope without testing its mirror."
'The Arrival'
https://www.youtube.com/watch?...
Uh, Linux geek since 1999.
So... what *modern* development methodology and platform did they use?
:T:R:A:N:S:
Lol, the T in TIFU is really more of a guideline, in that it's OK to completely ignore it. Same goes for the I.
TIFU is really more like TITOASWIOSEFU: Today I Thought Of A Story Where I or Someone Else Fucked Up. Not quite as catchy.
Only crack the nuts that crack. You don't put the ones that don't crack in the sack.
"cos Arianespace is totaly not a thing, ESA was closed down years ago and Darmstadt is only known for its football team.
Watch this Heartland Institute video
Flight software developer here (I *am* a rocket engineer, of sorts)
Spacecraft stuff is not made in sufficient volumes to have standardized interfaces beyond the basic electrical interface. Sure, there's a 24V DC power, perhaps a few discretes, some discrete telemetry (voltage, current, temperature), and some kind of data interface (MIL-STD-1553, RS-422 serial, or SpaceWire are most likely).
So you will be writing some custom software to deal with this almost one-of-a-kind interface.
Typically, you are inheriting control code from some previous spacecraft, as well - flight software is expensive, the stuff you have to do is pretty much the same every time, there's a good case for re-use. And, perhaps not all the corner cases of that inherited software. Or perhaps your little shim layer that sits between old code and new device and translates "new device format" data into "old device format" data has some issues.
Flight software typically doesn't have lots of extra capability: you have to test it over the entire range, so it tends to be "do we have a specific requirement for that? Yes: build it and test it; No, it's nice to have: Don't build it" So your idea of "incorporate lots of flexibility against potential future devices" would be a non-starter: what requirement would you design against for that "potential future device"? How would you justify that particular requirement, as opposed to another? Say your existing software MUST handle 100 byte messages from the reaction wheel controller. and you want to say "why don't we code it for 1000 bytes to make room for expansion?".. that extra space comes at a cost: memory costs money, testing to 1000 costs more than testing to 100. And ultimately, someone will say "well, why not 2000? or 500?" - unless there's some natural "breakpoint" in the cost function, there's no good rationale.
And with any sort of self checking, you have to trade off the failure probability of the self checker.
The summary says the star tracker didn't work in "an area of low magnetic flux" (the South Atlantic Anomaly). The true issue is that the SAA is a high radiation area and the radiation caused an SEU in the star tracker. The Scientific American article was a bit mixed up about dumping the momentum stored in the reaction wheels. The text is a bit jumbled, but I believe the article was referring to magnetic torque rods which produce a force vs. Earth's magnetic field, but they only work if the spacecraft is stable. The spacecraft was never stable because the IRU (gyroscopes) provided erroneous information. In the end, the ACS issue (probably a sign error) is what killed the spacecraft.