Software Error Caused Soyuz/Galileo Failure
schwit1 writes An investigation into the recent failed Soyuz launch of the EU's Galileo satellites has found that the Russian Fregat upper stage fired correctly, but its software was programmed for the wrong orbit. From the article: "The failure of the European Union’s Galileo satellites to reach their intended orbital position was likely caused by software errors in the Fregat-MT rocket’s upper-stage, Russian newspaper Izvestia reported Thursday. 'The nonstandard operation of the integrated management system was likely caused by an error in the embedded software. As a result, the upper stage received an incorrect flight assignment, and, operating in full accordance with the embedded software, it has delivered the units to the wrong destination,' an unnamed source from Russian space Agency Roscosmos was quoted as saying by the newspaper."
I've been hearing all this about the much vaunted chops of these Russian coders, but frankly I don't ever see it.
I've heard American programmers are brilliant but then Mars probe crashed because it used wrong units (why didn't it warn that parameter was too low?) ... or the "cloud" services crashed due to (leap year, HD error, "unspecified error", etc.. etc..)
I've heard European programmers are brilliant, but then Ariane explodified itself due to an overflow
I've heard Japanese programmers are brilliant, but then the Honda thing happened, causing cars to go out of control.
They obviously haven't even heard of SQA. What gives?
Easy to blame things in hind sight and be all grand about it. If you haven't yet fucked up, it's because you have yet to achieve anything yourself.
This is not a SW error! The software put them right where they were told to. The orbital parameters were wrong! This is a data error not a SW error!
There's almost no overlap between the skills & techniques necessary to write & verify critical software (e.g. when lives or huge amounts of money are on the line) vs. what is considered to be "programming". Modern software engineering's approach to reliable system design is about where hardware engineering was fifty years ago, and about where civil engineering was 100 years ago.
SQA is a joke. Reliable systems are made using way more robust techniques, including: (a) a severely restricted state space, (b) redundancy, (c) formal proofs, (d) fully (and formally) specified interfaces, (e) random simulation, (f) several different types of coverage, (g) physics-based analysis, etc.
The failure of the software community to understand this distinction is why I'm scared to death about the coming world of driver-less cars and robots performing surgery. How many people are going to be killed by C++ in the next decade?
This is probably something that is well understood by the engineers who are building robot surgeons (and maybe even by those building driverless cars), but it certainly isn't well understood by the overwhelming majority of software engineers and it's just a matter of time until the unwashed hordes of C++ monkeys are unleashed unto critical systems.
Bridges aren't designed and tested by "trial & error"--if they were then half of them would fall down within a few weeks. Neither are buildings or pacemakers or computer chips.
There are some scary problems with how [many if not most] software engineers see the world which don't bode well for a world where software can kill:
(a) by and large they've had essentially no exposure to any method of verification other than "trial & error"
(b) they have insufficient reverence for cause and effect because most of their bugs have really low cost (as in, nobody dies)--therefore they aren't mentally trained to make disciplined decisions.
(c) arrogance: unlike every other kind of engineer, software engineers rarely encounter the boundaries of their knowledge. A civil engineer knows when to call a materials engineer, a mechanical engineer knows when to talk to an industrial or chemical engineer, but a software engineer spends their entire lives inside a carefully constructed virtual world where they can't really do that much damage.