Crowdfunded, Solar-powered Spacecraft Goes Silent
Last week saw the successful launch of the Planetary Society's LightSail spacecraft, the solar-powered satellite that runs Linux and was crowdfunded on Kickstarter. The spacecraft worked flawlessly for two days, but then fell silent, and the engineering team has been working hard on a fix ever since. They've pinpointed the problem: a software glitch. "Every 15 seconds, LightSail transmits a telemetry beacon packet. The software controlling the main system board writes corresponding information to a file called beacon.csv. If you're not familiar with CSV files, you can think of them as simplified spreadsheets—in fact, most can be opened with Microsoft Excel. As more beacons are transmitted, the file grows in size. When it reaches 32 megabytes—roughly the size of ten compressed music files—it can crash the flight system." Unfortunately, the only way to clear that CSV file is to reboot LightSail. It can be done remotely, but as anyone who deals with crashing computers understands, remote commands don't always work. The command has been sent a few dozen times already, but LightSail remains silent. The best hope may now be that the system spontaneously reboots on its own.
I’m usually the first to defend others when some bug like this makes it through testing. Hindsight always being 20/20, only takes one bug amongst a million good bits of code, etc. But this just seems like something that even basic testing should have caught.
Did they not run this thing on the ground for a few weeks? That’s just basic testing, especially for something that is going to be inaccessible for a while. Also that some critical bit of processing relies on stuff being written (and then presumably read back from) a csv file is very worrying.
This sounds like some very shoddy work.
App appers know that apps can app 32 mega-apps without apping!
Apps!
How much is that in library of congress?
Please, I'm no nerd, I don't know this "technology" stuff.
I don't even know where to start.
I try to fund a kickstarter about once a month that has some scientific value. I nearly funded this one but figured it wouldn't have any problems funding so I went for smaller projects instead.
Money well saved!
I know the average IQ at /. has gone down over the years, but I think the explanation of what a CSV file is is slightly too much dumbing down.
Comment removed based on user account deletion
You'd think that something as small as 32MB would have been tested before they launched the thing... It doesn't sound like it takes very long to fill up 32MB either
Nothing like a thoroughly tested system eh? It was only going into space at an exceptionally high cost.
Wonder if they have thought about the more difficult issues: http://en.wikipedia.org/wiki/Single_event_upset
Good luck with that one guys, you will need it!
Roll your log files. I smell a DevOps debacle.
putting the 'B' in LGBTQ+
I'll never understand how groups (Especially NASA) can spend millions, or even BILLIONS on projects like these and not even complete the sorts of rudimentary testing that those of us in the professional software fields have to do every day. Ok, this computers going into space and going to run for days/months/years... whatever... so hey, maybe we should boot it up while it's still on the ground and see if it'll run for a couple of months without crashing first?
One of the mars rover had the same problem. It worked fine, but after a week or two it died because of a flash bug... they'd never tested it on earth for a week strait prior to launching a billion dollar piece of hardware?!?! What's wrong with these people? This is rudimentary stuff. You test it prior to launch for a long period of time. Then box it up and don't touch it. If you make any changes, re-test.
and you are an idiot for using it.
It came across a tachyon eddy and is at warp speed on it's way to the Cardassian homeworld.
It is called a design flaw when the only way to clear a growing file is to restart an application.
@"if you're not familiar with CSV files"
... the ability (small code here) to power cycle and come backup in maintenance mode where it doesn't do anything on its own except receive diagnostic commands.
The computer also needs a sibling for fail-over.
There may be reasons those were left out that I would agree with.
I sure hope they can get this puppy lined out.
It little behooves the best of us to comment on the rest of us.
The next time you beat on MS or Apple for some flaw that hardly hurts anything just remember that your god, Bill Nye The Science Dork, and a whole crew of engineers couldn't be bothered to test a bit of code or make a Raspberry Pi reboot remotely.
LOLZzzz!!!!
Another piece of high speed debris floating around in space.
cat beacon >> beacon.csv
instead of....
cat beacon > beacon.csv
oops.
Do not look at laser with remaining good eye.
They launched it with that basic of a software bug that they already knew about? How about edit out the first line of the CSV file when you add another one and maintain a max length? Or write a backup code where if it fails to reboot, close and delete the file without rebooting.
Actually this particular failure wasn't as obvious of an oversight as you may think. The reason it happened was because in an existing system one particular set of parameters were logged in miles since they weren't responsible for flight control (which NASA mostly uses metric for). Later on portions of this design were reused and an engineer decided to use the originally non-essential values as a feed into the navigation system.
The problem in this case is when you have something large and complex (a space craft) and a large organization with many projects (NASA/JPL) the younger generation tends to just rely on what's in place without doing the research they should.
That being said there were many times this particular error could have been caught on the ground and weren't, and that's a process failure. The "process" should have caught it.
Now get off my lawn!
FIRED! ;)
I'm unfamiliar with the compressed music file size as a metric? Is that "inna gadda da vida" length or "Her Majesty" length?
Should have used Windows; it reboots all the time! Dumb asses!
I believed that one for a long time but it turns out the conversion error was actually well-known by the engineers and a correction burn was SCHEDULED to fix it. The managers, for whatever reason, never carried out the maneuver and the rest is history. Then, they blamed the engineers.
That is the official story, yes.
I say professional because NASA screwed up a few years back with a probe to Mars when two systems attempted to communicate. One "spoke" in Kilometers, the other "Miles".
That's absolutely not like what happened.
. . . have you tried turning it off and on again? :-)
Had they used a simple Berkeley database like dbm, gdb, tdb (my favorite), deleting old records is trivial. Of course it is a little more complicated than CSV, but CSV is for moron Windows programmers anyway. This issue tripped up the Spirit Rover as well. Unimpressive. A rudimentary risk analysis would have identified this. The project deserves to be bricked.
This is why they should use Windows instead of Linux to run the thing. Windows would have rebooted by now.
SpaceX can retrieve it long enough to hit the reboot button...
when some bug like this makes it through testing
Testing? what testing? If it compiles, it works. Every hacker knows this.
I have to say, when I read that the spacecraft ran Linux and had died, I naturally assumed that someone had left the auto-update enabled and it was busy trying to apply about 50 million kernel patches.
politicians are like babies' nappies: they should both be changed regularly and for the same reasons
and not as a verb. Using "hope" as a verb in spaceflight hasn't always gone very well in the past.
"Win treats sysadmins better than users. Mac treats users better than sysadmins. Linux treats everyone like sysadmins."
Shaka, when the walls fell
I think its clear why most space craft are vetted so much before launch. Because Geek Squad does not go into space.
Did Bill Nye fuck this thing up? ? He is listed as a comedian on his wiki page, maybe he is playing a joke on you guys
How much is that in library of congress?
Please, I'm no nerd, I don't know this "technology" stuff.
6 Shakespeares... or
16.5 gzip-Shakespeares... or a whopping
22.6 bzip2-Shakespeares.
The Bard fares well by the Burrows-Wheeler algorithm for his works are so oft-repeated he even runs on and repeats himself. "...So all my best is dressing old words new, Spending again what is already spent" as RLE (run length encoding) and "To smother up the English in our throngs, If any order might be thought upon..." as MTF (Move to Front) Transform. "We render you the tenth; to be ta'en forth! Before the common distribution at your only choice... as encode to Huffmans and selection of the sweetest table, and "Spare your arithmetic; never count the turns. Once, and a million!... symbol usage stored as sparse array.
Here is a brief video clip showing the moment the LightSail team browsed the log file to discover the error.
<blink>down the rabbit hole</blink>
They couldn't afford to pay Will Smith for that many sequels
Just two systems that do the same thing linked to the same antenna that operate in such a way that they're both not going to develop the same problem at the same time... and such that one can upload software patches to the other.
I believe this is the way a lot of the deep space probes were set up. They have a primary computer and a diagnostic computer. And while the main system drops or the diagnostic system drops they don't drop at the same time. The team on earth can figure out what is going on in one of the systems and instruct the other to fix it.
That is my understanding how how many of these systems work?
I've decided to stop wasting my time responding to AC trolls/sockpuppets... so if you want a response from me... login.
Coming up next on Slashdot... Linux is an operating system, kinda like Windows or Mac OS, but built by a bunch of neckbeards, and uses about the same amount of space as 10 compressed music files. Some versions use less, some use more depending upon how it's configured.
Wow; I think it's time to move on from Slashdot. Taco would be spinning in his grave, assuming he was dead.
If telephones are outlawed, then only outlaws will have telephones.
A satellite running Linux is contingent upon a spontaneous reboot to function again? Great, now we'll never hear from that satellite again.
Clearly, the plan should have been to run the device on Windows 98. That way, it would only be out of commission for 49.7 days.
was there any testing done beforehand? I am talking about profesional testing, not simple button press oh it works...
Such an easy error and undiscovered..." expensive this one was."
While it was ultimate NASA's failure, the problem was with software provided by Lockheed, all of the errors occured on the ground in the US, were noticed, but not acted upon by management. Ultimately, it was the failure of a PHB, as most royal f###-ups are..
have you tried turning it off and on again?
32MB ought to be enough for anybody...
You know, if they were using Windows as the OS, they would not have to "hope" for a spontaneous reboot...
Everyone was thinking it...
Because making sure the log files dump after a certain size would be too much code right ?
We don't normally test our spacecraft systems, but when we do, we do it after launch.
Okay, first off, why make the satellite's systems log EVERYTHING? Aren't enough people listening to its broadcasts to catch "most" everything? Assuming they can remotely command the system to transmit all or part of its logs on demand, why not retain only the last n broadcasts worth of logs? Second, why in God's Vast Expanse (giggity) would they use CSV? I understand the environmental constraints probably rule out a full Linux install but is it really so hard to use something like SQLite? With a proper setup, the install as well as the log data can be kept VERY small. This would've allowed them simple purging of old data, extreme compression, and very low processing overhead.
Last week a week is approximately the amount of time between new 'Keeping up with the Kardashians' episodes saw the successful launch of the Planetary Society's LightSail spacecraft, the solar-powered satellite that runs Linux Linux is like Windows for smart people and was crowdfunded on Kickstarter Kickstarter is a place to buy digital watches . The spacecraft worked flawlessly for two days, but then fell silent, and the engineering team has been working hard on a fix ever since. They've pinpointed the problem: a software software is like what you download from the app store glitch. "Every 15 seconds, LightSail transmits a telemetry beacon packet a telemetry beacon packet is like a tweet . The software controlling the main system board writes corresponding information to a file called beacon.csv. If you're not familiar with CSV files, you can think of them as simplified spreadsheets—in fact, most can be opened with Microsoft Excel. As more beacons are transmitted, the file grows in size. When it reaches 32 megabytes—roughly the size of ten compressed music files 32 MB is also approximately the size of 13 iPhone 6 selfies —it can crash the flight system The satellite's twitter feed blows-up ." Unfortunately, the only way to clear that CSV file is to reboot LightSail Like holding down the power and home buttons on your iPhone at once -- don't try this unless instructed by someone at the Genius Bar . It can be done remotely, but as anyone who deals with crashing computers understands, remote commands don't always work Like when Siri plays Billy Ray instead of Miley . The command has been sent a few dozen times already, but LightSail remains silent. The best hope may now be that the system spontaneously reboots on its own Like when drop your phone in the pool and it still works .
Spacecraft C2 is a place for an RTOS, some something as big and kludged together as Linux. Hell, it shouldn't be running on a virtual memory machine at all.
Kids these days.
It's called "logrotate" and shame on you if you stared the logger and didn't configure it first.
"File to fit, pound to insert, paint to match" - Aircraft Maintenance 101
Sure, follow those old embedded rules when we coded for Z80s and 16k of memory. And we counted cycles by hand.
Today, though, dynamic memory allocation is a reasonable thing. Granted you want to make sure it can't fail, and that "out of memory" is handled appropriately. This is non trivial, but hopefully, you have a generalized approach which can be rigorously tested, and then reused.
Hardware is cheap, software development is expensive: today it is much more appropriate to throw hardware resources at the problem and allow the software people to be more efficient. Particularly in a spacecraft, where flight hardware, while expensive compared to consumer stuff, is still cheap compared to people who are developing software for that flight hardware.
The other thing is that processors are MUCH faster today, so on the fly bounds checking is reasonable: Before pushing stuff on the stack, check to see if room is available. and, oh my gosh, what about array bounds checking to prevent buffer overflow. We're not coding for the 1MHz 1802 we used on Galileo any more.
The "totally deterministic" model of embedded must go, if we are to advance: it's harder to do correctly, but design for soft failure and recovery is a much, much better solution in the long run.
LightSail's problem, though, might be that the development team wasn't aware of the need for care. Cube-sat projects are full of aero-astro majors who have learned system engineering, and assume that data and spec sheets accurately and fully reflect the behavior of the devices they are using, because that's what they were taught.
Even the Wikipedia Autists have depreciated the moronic binary prefixes. NOBODY USES THEM.
Idiots! What did they do for a ground test, flip the switch on to see if it lit up, then power it down and declare it good to launch?
Actually, NASA had a "file system full" problem on one of the Mars rovers, almost exactly the same problem that Lightsail has. Fortunately they were able to fix it remotely.
One word: watchdog
guess what happens when you fill an NT system drive which has a dynamic swap file on it?
Genius.
Political debates have me rolling my eyes so much I think I got optical whiplash. I should sue. - Foamy The Squirrel
The best hope may now be that the system spontaneously reboots on its own.
If your best hope is a combination of divine intervention and spontaneous Artificial Intelligence, I think you are royally fucked.
Help! I'm a slashdot refugee.
Unless I misunderstood the mission, the payload isn't coming back so having a log file for post-mission review is meaningless. If they want to log anomalies, or commands, or telemetry, why aren't they sending it back? Either a continuous stream or regular or on-command bursts. In either case, there still would be no need to retain it in a file, you simply dump the buffer once its be transmitted and start from zero. Am I missing something?
they reset spontaneously and could save this mission...
"Win treats sysadmins better than users. Mac treats users better than sysadmins. Linux treats everyone like sysadmins."
fsck#
But I can't decide if it should be modded FUNNY or INSIGHTFUL.
If telephones are outlawed, then only outlaws will have telephones.
Even if they needed to keep this data around, I wouldn't have picked CSV as a format. The ASCII overhead is wasting a lot of their 32MB. A binary log would have been a better choice, systemd jokes notwithstanding.
Suck on this, you Space Nutter BITCHES! Hooray! Your delusional Space Age fantasies are shattered again! Now cry and weep while I piss all over your tear-streaked zit-ridden faces! NOBODY. IS. GOING. ANYWHERE. EVER!
effective. Power h
Seriously, even IT lets logs rollover. You may not be able to read them easily, but they DO rollover.
The software controlling the main system board writes corresponding information to a file called beacon.csv. If you're not familiar with CSV files, you can think of them as simplified spreadsheets—in fact, most can be opened with Microsoft Excel. As more beacons are transmitted, the file grows in size. When it reaches 32 megabytes—roughly the size of ten compressed music files—it can crash the flight system."
Eng 101. Resources are not infinite. Didn't anyone thought about cycling logs? Or treat it as a circular buffer? What happened to capacity testing? Or better yet, catastrophe testing as is, what happens when the system runs out of space. This does not look like data that is critical to keep. Critical to capture yes, but not critical to keep. Most on board systems, embedded systems and/or systems with minimal resources use a circular buffer to capture control events for these reasons.
This is not a web site project, but an freaking spaceship. I can see clueless developers doing these kind of mistakes in web/enterprisey systems (I know, I've seen). I couldn't have imagine this on a much more critical type of system... but then we have the Ariane 5 incident.
Unfortunately, the only way to clear that CSV file is to reboot LightSail.
A control system should by default reboot itself and clear its non-critical logs when running out of space, or at worst, keep running without logging the events. This is so trivial to test, did the system and software engineers never saw a use case that capture this scenario.
It can be done remotely, but as anyone who deals with crashing computers understands, remote commands don't always work.
They don't always work if you don't test for them exhaustively... and they are not hard to test... and their continuous testing should be a priority at every release/test cycle. The engineers in this project are far more intelligent that I am, I'm sure of it. But man, this specific problem, I'm like "dude, wtf?"
If I was a paranoid person I would be suspicious that there was nefarious entity working to prevent the use of solar propulsion. How many attempts have there been at launching/testing such systems, 5? And how many of them have failed, been destroyed or deployed incorrectly, 4? The one successful attempt was relatively small (45'x45') and heavy (694 lbs) for a solar sail craft. Even so it achieved an impressive delta-v of 400 m/s.
Nothing like adding a filesize check into the save script so you don't fill up your filesystem and crash it. That would have cost them what two lines of code?
That's like building a nuclear weapon with no off switch. Who does that?
Did they christen this spacecraft? Did they name it the USS Eve, perhaps?
ln -s /dev/null beacon.csv
Okay, so they tried turning it off and back on, but did they check to make sure it's plugged in?
I smell a new xkcd comic..
1. Self-important, super militant, ultra snide atheist talking head Bill Nye is the CEO.
2. A female, Barbara Plante, is the "system engineer".
For all your talk, Nye, you sure dropped the ball when it actually mattered, didn't you? Where's your super intelligent science horse shit now? Great job oh-most-high-scientist.
Barbara Plante, shame on you. You just reinforced the stereotype that females can't code worth a shit. Every female engineer on the planet should hate your name right now.
Wrong on both posts.
The error was that all of the JPL software ingests data in metric units for force (Newtons). This was clearly defined in the interface control documents.
Lockheed Martin provided the data files with the units of pounds, incorrectly. So the "small forces" were off by a factor of 4.5. This didn't get caught because it wasn't a "many orders of magnitude" error that would cause the navigation calculations to be obviously wrong. The forces involved are a small correction in the equations of motion, and there are other random effects of comparable magnitude.
Nobody checked to make sure that the units were correct: The JPLers assumed it was in metric which it has always been (including previous missions with spacecraft from L-M); LM didn't check either.
The underlying root cause was that this was in the days of Faster Better Cheaper: JPL& L-M offered to do TWO spacecraft for slightly more than the cost of one: such a deal. The problem is that the two missions (Mars Climate Orbiter and Mars Polar Lander) were essentially simultaneous, so the people were split between projects. There's not a huge number of people around who can support this kind of effort, so they were spread thin.
Read the MCO report (it's linked on the Wikipedia page).
If it would have been a government-run spacecraft, we could hear screams about wasted money and inefficiency. Had it been foreign-government run, we could here about nation's X decline or lack of experience.
This stuff can be found everywhere one finds young programmers raised in the era of "open source" code; too many of these people reach for other people's code and plug it all together like tinker-toys or Lego bricks instead of actually writing their own code. The worst version of this bad behavior is on steroids thanks to thinks like the RaspberryPi: embedding Linux into EVERTHING just because it has a few features you are too lazy to code yourself. When you grab other people's crap and jam it together into a blob to do something, it's far too easy to not notice all the other things that code is also doing.
I'll bet there is NOBODY on that project who has read ALL the source code for what they flew and NOBODY on that project who actually knows how all the code works. Stupid-on-steroids - the modern model of software development.
Wow, not this shit again. Non-profit or not wasting money on this stupid Carl Sagan daydream is just plain stupid. They are never going to get it to work just like the african tribes won't be developing nuclear weapons in an instance. They do not have the skills or the money to pull something like this off.
To repeate something that does not work is crazy. Planetary asshats.
In Agile, you code up to the release date and ship the code with the idea that you can rollback the code if need be. Code freeze doesn't happen nor does a QA phase. QA people don't even exist in modern agile anymore.
This is a perfect scam:
a) come up with a seemingly plausible idea obscured by high tech and/or science;
b) get investors to contribute to seeing a prototype created
c) have the prototype fail for any number of plausible reasons
d) profit
e) repeat until the profits cease to be worth the effort
The solar sail is a theoretically flawed idea, as achievable as perpetual motion. But it is a cool, elegant concept ... and people can be convinced to buy in to it. The best part is, nobody outside the scammers can verify success or failure. Perfect ...
"Consensus" in science is _always_ a political construct.
Have gnu, will travel.
That's the most basic thing any engineer will have in mind when generating files on any system, what to do when the file grows? What kind of techs would forget about it? This is a very basic mistake for this kind of project
Radiation plus Heat/Cold.
You CAN NOT simply use off-the-shelf hardware and expect it to work well in space even in LEO without extensive efforts to deal with radiation and thermal excursions. Radiation generally requires the use of specially designed parts. Thermal requires actual active/passive HVAC designed in and on board. Based on testing I used to do on commercial parts (in comparison to specially design space-grade ones), I'm actually surprised it lasted 2 days. Often we had Intel processors die with seconds to minutes of exposure in our labs under radiation or thermal exposure.
- Former "Rocket Scientist" who specialized in space craft electronics