Mars Rover Spirit Back Online
Skyshadow writes "Just in time for the arrival of its twin, the Spirit Mars Rover is back in working order. Programmers at the JPL have traced the problem to the rover's flash RAM, which it uses to maintain its filesystems. They are using a ramdisk in the rover's RAM to bypass the bad flash memory, and are working on a workaround for the bad flash. Good news, but the rover is still potentially weeks away from full operational status."
They signed up for Mars Online with 3000 free hours. What they didn't realize was that the free 3000 hours only applied to the first month of service. Once they paid their MOL bill, they got hooked back up. All the probes friends on Mars use MOL!
They should boot faster, using linux. Then they'd only be ten seconds away :-)
"'I pass the test,' she said. 'I will diminish, and go into the West, and remain Galadriel.'"
- JRR Tolkien.
Sounds to me like they need to send back checksums of the contents of the Flash memory and figure out if part of it got corrupted somehow. Then re-flashing that section would probably fix the problem.
/riff/Move over Rover, let the ramdisk take over!/riff/
Wonder wehre they got they flash ram from?
--
Its a martian virus...
You can read all about it at: Spaceflight Now - where you can continue to follow the status of both spirit and opportunity (which currently is hours away from landing).
|>>?
I think they should return the bad flash part to where they got it and exchange it for a new part... although getting the memory back to the store by the 30 day warranty might be a little difficult.
I hope they bought the extended warranty.
/^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}$/i
it was their AOL bill that wasn't paid? hmmmm...
Linux with kernel panic...
MadPenguin.org
During all of the "Spirit is broken" columns, I kept reading /. comments saying that it was likely a memory error due to the non-consistent errors...I guess a million monkeys with a typewriter can be correct :-)
Doh!
The status has been upgraded from critical to serious condition. Opportunity will most likely have the same problem since they are twin brothers and had an identical build process. They better figure out what is wrong with this rover before sending Opportunity to invetigate its part of Mars.
Engineers guessed that Spirit's troubles were in its Flash memory and set about sending the rover a complex series of instructions to see if they could get it to bypass the corrupted memory. Theisinger said engineers sent Spirit a command just before its daily "waking up," telling it to shut down and restart in what is known as "cripple mode," using RAM instead of Flash for its start-up instructions.
Some people may take this sort of thing for granted, but I for one find it remarkable that we can essentially reboot and perhaps even fix a system that is on a whole other planet.
Just wait until we have Interplanetary, Interstellar, Intergalactic Remote Desktop. I'm only half-joking.
The coolest voice ever.
If I understand this properly, they've got a damaged filesystem on the flash RAM. Not really a big problem, you just have to send someone over to the console to boot it up in single-user mode and run fsck. ... oh yeah, sending someone over to the console is a little bit difficult here. :)
Tarsnap: Online backups for the truly paranoid
Shouldn't they have like 5 Flash RAM's? Really,they shouldn't have one of anything. In my computer if my BIOS fries, I pop open the box and replace it. If it fries on mars, obviously I kiss my megamillion dollar project goodbye, all for a $5 Flash ROM.
Engineer 1: Ho-hum.. Little bit of ... whatever it is, 'ere... Hand me that thingamajig, will you?
Engineer 2: Yah, sure... Hey, remember that employee last month who got laid of within a week?
Engineer 1: Who? Vincent?
Engineer 2: Yeah, Vinnie... With the Italian accent?
Engineer 1: Yeah, him. What about the guy?
Engineer 2: Well, he has this offer on cheap RAM we just CAN'T resist!
Engineer 1: Really now? But-
Engineer 2: Look, our budget is already comparable to social welfare. We need to save some loot.
Engineer 1: Fair enough, buy the crap and hand me the other twisty-turny thingy over there? I need to screw on this name tag reading... "Spirit"?
Engineer 2: Look, it's either that or my wife's name.
Hate me!
If I was sending an embedded control computer to another planet, I would have chosen an OS with memory protection, not VxWorks. VxWorks is like DOS, and early versions of Windows, where one pointer problem in one task can corrupt the whole system. Sure, we don't know that's the problem now, but it would be nice to know for sure that it wasn't.
I didn't even know they made rad-hard flash!
A similar thing happened with an old router I had. The only problem was, we needed Win98 in order to reflash it...
Saying "I'll probably get modded down for this" in a post is the best way to get it modded up.
Is there a chance that the problem could've been caused by electrostatic discharge? Rover bounces on rubber airbags on sand, bags fold up, Rover rolls off, Rover touches rock - zap!??
They should have used linux...
True genius is grasping a situation like a peice of fruit, and peircing it just right so that it drains dry.
...will apparently cause one out of every trillion bits on Earth to flip randomly... I guess with less of an atmosphere, it is a bigger problem on Mars! ;)
libertarianswag.com
someone should have just loaded up a nice copy of PCanywhere on the thing before it left.
mars rover.. what is it all about? is it good, or is it whack?
As always go for the name brand, or high quality parts, when performace is a necessity. Goes back to the problem with government stuff, the lowest bidder always gets the contract and they tend to buy the cheapest parts.
I must admit I'm very happy they managed to get something back from spirit. I hope they get the little guy running again. Hopefully when we get to the martian surface the astronauts won't have to trip over the dead carcasses of many dead rovers. O
I wonder why they didn't use any redundancy is stuff like the flash RAMS? Wouldn't that be an obvious thing to do in a mission critical system (especially when you have no way of changing the parts).
Also, don't they use ChipKill? (Chipkill can identify a bad chip on the SIM card and bypass that particular chip, keeping the rest of the SIM operational)
This sig is in Spanish when you're not looking....
I do seriously wonder if these types of projects will tell us anything more than esoteric wonders of Mars, but from a strictly engineering standpoint, perhaps it's worth it after all.
"Who are in control, they are not in control of anything - they don't even control themselves!" - Glen Beck
Here is the link to the real story. The one given in the /. acticle is getting pushed down spaceflight's page.
I have a friend who works in the field. Space travel hoses electronics bad. Triple redundancy and over-engineering is the name of the game. This is nice to hear. I would imagine that something went wrong intransit or on-landing, but they can keep going,
Great ideas often receive violent opposition from mediocre minds. - Albert Einstein
DUPE.
Way to go, Timothy.
I remember in the last thread about the rover, someone opined that it was bad memory, then proceeded to give a half dozen reasons why. Totally nailed it.
You're all so damn smart. Sometimes I don't think I'm not worthy of posting here.
What's the OS on this critter??
Next time, no more deals from gotapex.com!
I know a lot of ppl are using flash ram in smaller computers for booting linux or what not. Well if they are writing their logs and other things to that flash be aware that you can only write to it so many times before it fails.
Was NASA writing to that flash or just reading? A ram drive in flash sounds like it will access/write thousands of times a ?minute? This should wear it out quickly.
NASA has a report, and it's very bad news!
Well, bad news anyway. Bad flash? Maybe it was the solar storms. Can't they knock out flash, at least in space?
- None can love freedom heartily, but good men; the rest love not freedom, but license. -- John Milton
If I was sending an embedded control computer to another planet, I would have chosen an OS with memory protection, not VxWorks.
Actually, they might have protected memory if they use VxWorks AE RTOS/Tornado Tools 3.0. Spirit uses VxWorks, but I don't know what version they used or when they had to commit to a particular version of VxWorks.
Also, as the article mentions, memory protection adds overhead and can affect real-time performance. Hard real-time software cannot afford to have a complex layered structure and lots of conditional code that adds unpredictable delays. For that reason, many really real-time applications run very close to the hardware (for better or for worse.)
Two wrongs don't make a right, but three lefts do.
This is horrible planning by NASA! Web servers have more redundancy than a Mars Rover? Ok, lets see. 128 mb RAM -- (assuming it's a DDR sodimm) about $100 to add an extra 256 mb flash ram -- ~$100 Any reason why they didn't add any backups?
Jason Faulkner
Old Os Administrator
jason@oldos.org
oldos.
Damn, it sounds like freaking Microsoft. Say they know about a bug or a problem, and weeks LATER they fix it......
It's either on the beat or off the beat, it's that easy.
I moderate therefore I rule!
--
I mean like beagle isnt using its flashram anymore, just go and jack some off it. While your at it TAG the Beagle with some PRO-US graffiti :) hell maybe its got nicer rims too
Seriously, can you imagine the first manned expiditon seeing the Beagle Jacked up, tagged , up on little martian cinderblocks, All that and we already got a head start on building martian cities
Ive been unable to find any hard information on the design of the MER memory systems. If anyone can point me to a technical brief id be very happy.
From what ive pieced together the MER system is something like this:
One RAD6000 powerpc cpu.
Connected via probably compact pci to 128 mb of ecc sdram.
256 mb of flash. No info on what make of flash, but likely Intel since they are the biggest. There was some info from the press conference that there are actually two flash chips and that the flight software is redundantly stored on each. So does this mean that there is actually 128mb of redundant flash? Also it was said that they had problems even with the redundancy, could they possibly have overwritten something? We all know that even a redundant raid does not stop filesystem corruption.
No information on how the flash is connected, parallell / serial? How the redundancy works?
Btw, I guess flash is rather radiation hard since they require 10 - 20V to erase / write.
And if my palm locks up do I have to send it to mars and reboot it?
You can read all about it at: Spaceflight Now - where you can continue to follow the status of both spirit and opportunity (which currently is hours away from landing).
Yeah, sure. What a better way for a geek to spend a Saturday night than checking Spaceflight Now's website for up-to-the-minute reports on what some hi-tech gizmo is doing on Mars. Time to fire up the microwave and break out the extra-butter-flavor PopSecret! This is gonna be one exciting night!
They should stick with purple next time.
Congratulations to the hackers at NASA. I don't know whether to be jealous or glad that it isn't me sitting there hacking away at a machine that is sitting on another planet.
The handle on their problem is a very good thing.
Knowing about the problem before the twin lands is probably a good thing because they might anticipate the problem.
But if it takes weeks to fix the solar panels on the lander will be degrading in the martian atmosphere. The will miss the down time for Spirit's task list.
It must be so frustrating to sit on a possible fix and wait for a communication window, or computer response to see if you're right.
man!! it's what I call remote debugging!!
That's show him where the redundancy is!
Yay! We win.
The flash card they used on the rover must've been made in Taiwan. ;)
...and it's amazing NASA could press it at the right time from 124 million miles away (1.3 AU). Although I wonder how many times NASA did have to press it before they got the timing right -- we only know about the success :-)
I have had some tough calls in my time but I have never had to walk a robot 283 million miles away through brain surgery. Man I am glad I did not get that call. This is going to blow there call averages all to hell. I raise a cup of Joe to you, Rover Help Desk man.
Papa Legba come and open the gate
This is the last image received prior to the recent issues with Spirit...
I know those guys at NASA are smart... so does anyone know why they sent opportunity right after spirit? I would think it would be better to wait and see if any problems occured with the Spirit, and learn from their mistakes. I'm sure there is an optimum aligning of the planets for a launch... but is that really so rare that they couldn't wait?
This is the conventional wisdom, and in my experience, this particular nugget causes more embedded and real time software projects to fail than any other.
First off, on a modern PowerPC processor, memory protection (that is, without virtual memory support) can be implemented very cheaply. If you can do it just with the IBAT/DBAT registers, it should be a constant-time overhead, which is good enough for hard-real time. Oddly enough, I can't find a single reference on the net that measures the cost of memory protection alone on a modern CPU. Anyone? Anyone?
Secondly, though the rover certainly may have some software components that have hard-real time requirements, that doesn't mean that every single line of code does. Typically, less than 1 percent of the code in a real time system is hard real time. In that case, you can run the real-time code in ISRs, or perhaps in a dual-mode system, like RT-Linux, or in high-priority kernel threads (as with QNX). In any of these situations, you can run all the rest of the code in protected memory space.
Oh, shite (I'm struggling not to swear trollish gibberish!)... how can the fools even think of using flash in a space mission? What's most rad sensitive that a bunch of trapped electrons hovering on a thin isolation layer tweaking a threshold voltage?
Mi domando chi à il mandante di tutte le cazzate che faccio - Altan
Opportunity is fast approaching the red planet. It should be an interesting night at JPL. Execellent work guys, good luck.
[alk]
Here's a rant by a JPL guy about appropriate technologies for software on deep space probes. He recounts one story of a failed probe "100 million dollars, and 100 million miles away".
They fixed it. The fact there was a lisp REPL running on the spacecraft helped.
That's cool:
(unwind-protect
(progn (do-science)(talk-to-earth))
(wait-in-repl-for-earth))
....until Windows Update pushes out another Critical Update next week.
NASA needs to turn off Automatic Update. Oh and they need to uninstall MAME32....the martians are having too much fun playing space invaders.
Welcome to Slashdot. News for Nerds and 10 million experts on every damn subject on the planet. Does everybody here suddenly have a PhD in engineering and space technology? Because I sure haven't. Every whacked out theory gets +3 for Informative now.
A normal commodity Flash memory has around 10,000 erase cycles minimum. You should be very safe in assuming that whatever NASA is using that it's got plenty of erase cycles for whatever tasks it's going to do.
A programmer may miss out a - in code or mistake miles for kilometres but somebody would have to be seriously looking the other way for their entire life if they'd specified something which would die this easily.
where the russian cosmonaut says "American components, Russian components. They're all made in Taiwan!"
...a Beowulf cluster of two working rovers! Go NASA!
I find it very difficult to believe that given the money involved that NASA used Commercial Off The Shelf (COTS) componentry for something this critical ?
:-)
Surely they have enough pull to have custom semiconductors designed and manufactured for this type of task ?
I would have expected some seriously 'on-die' redundancy for all the OS boot media as well as RAM - preferably with triple-redundancy and bit level voting for all bits.
Oh well, there's always the next time
Remember, when we launched this probe, no sooner than we had it in space, butt-nekid out there, we get assaulted with every kind of solar flare imaginable. So many I lost count, but they were the subject of numerous slashdot stories.
Kudos, NASA, that this thing works at all!!!
"Prove all things; hold fast that which is good." [KJV: I Thessalonians 5:21]
How about just simple ECC?
Maybe I'm crazy but the systems I run that have ECC are incredibly stable even when using Windows. My Alpha got 100+ days uptime with daily use on Windows NT4. I have an old Xeon that easily did 40 days, and I shut it down by mistake.
imagine you could decide WHEN you die. Wouldn't be that great? Biological engineering science will allow us to do just that. What do you prefer? TO see a few unmanned missions going on Mars or ACTUALLY going on Mars? The battle against time is online. Getting the anti-age pill should be top priority for any geek. Imagine unlimited time of geekness? Bliss. Unlimited space exploration for your OWN consciousness!!! Everytime I hear space news, that's what I think about....
Damn! Where's a Wal-Mart when you need one?
To-do List: Receive telemarketing call during a tornado warning. Check.
I don't know that much about VXWorks, but I heard that one of its main assets is having a very small tight multitasking kernel.
They were able to regain the system, despite loss of a major computational component. Remotely. Through a debug link. That sure says a helluva lot for the robustness of the OS and how they configured it.
Good job, JPL.
"Prove all things; hold fast that which is good." [KJV: I Thessalonians 5:21]
Damn Macromedia!!
The Spirit is willing, but the flash is weak.
(Posted by Jane Slee and John Stracke in separate usenet postings.)
Wonder wehre they got they flash ram from?
From the same place that you got your typing lessons, apparently.
Ba-da-boom. I'm here 'til Tuesday.
They didn't use Windows CE. Remember the diplomat months back that got locked in his 7 series BMW because of a computer crash? :)
There is a big difference between standard flash and radiation hardened flash. In fact we are designing a project with one of these VME buss units as a storage array.
I wonder if they store the initial, by definition very critical boot section in flash or another really non-writable ROM?
Does anyone know further facts about that topic?
The present series of orbiters/landers (Nozomi, Mars Express, Spirit, Opportunity) were launched at such a time as to take advantage of the most optimal Mars-Earth configuration for something like 60,000 years. I believe the bottom line is that it was a time you could get the most science there for the least cost of launch.
Shame on my fellow American who said we should strip Beagle 2 and leave it up on cinderblocks. If Beagle is ever discovered to have soft landed, I would think the only proper thing to do would be to restore whatever's wrong with it, and let it complete its mission. (HAL, V'Ger, anyone?) Given the discussion of things like the effects of radiation exposure on electronics, you'd just have to be interested to know what a 50-or-150-year-old "dead" lander might be able to wake up and do.
If Spirit's problems aren't resolved, the Mars Scorecard should at least reflect that Beagle was the less expensive failure.
(Disclaimer: I visited England for the first time last year, and falling in love with the whole place doesn't begin to describe it. R.I.P. Beagle 2. *sniff*)
They needed to be in the same launch window, otherwise you would have to wait a long time to send another.
So... I wonder if they'll consider validating MRAM more quickly if Flash is found to be more error prone.
You know how NASA works. The Space Shuttle running on 486's and whatnot. I understand the science behind that reasoning, as sad as a 66 MHz processor seems to us geeks nowadays, but I wonder if MRAM will prove more flexible and stable for future space missions.
"...Well, there's egg and bacon; egg sausage and bacon; egg and spam; egg bacon and spam; egg bacon sausage and spam..."
Redundant Array of Expensive Flash Rams?
Consider that an interstellar probe will take years to receive updated instructions. By which time, any fix will probably be irrelevent. Plus if they're more than 30 light-years away (practically next door by galactic standards) they guy who sent out the instructions probably won't live long enough to find out if they worked!
The mars rover seems to have problems. I read that the rover was rebooting itself at least sixty times a day.
....... hello, welcome to microsoft international ..... hello welcome to ..... hell0 ..... hel-l0 .... hell .... no ....
This got me thinking about what a hypothetical service call to microsoft about the problem might be like.
ring.... ring.... ring......
Microsoft:
hello, welcome to microsoft international automated service sys
microsoft int
Your call will be answered in just a moment by Microsoft's advanced, state of the art fishy intelligence system designed to give you the
illusion of speaking to a real trout while preserving all that you have come to expect from microsoft technology. Please speak naturaly to the oi vey when your call is answered.
pause,
Hi there, my name is sivekenanda vishnu ramakrishna singh, this is not my real name but rather a randomly generated name designed to prevent you guessing my country of orgin; you can call me art. Your call has been routed to me by microsoft artificial intelligence routines which
have examined your internet traffic for the last three years and determined you are in paris, poland and speak yiddish but due to a minor
system anomaly in my programming I am speaking to you in english.
Please accept feltup condolences from everyone at microsoft on the death of your piano.
How may I be a pittance to you mrs calabash?
Caller:
Well, my name is spike mcman and I'm calling you from the jet propulsion laboratories in pasadena california. We seem to be having a little
problem with one of our space craft on Mars, you see its rebooting itself more then 60 times a day.
Microsoft:
And your problem is?
It's named Opportunity.
They shoulda used some decent Sandisk or Viking flash memory instead of that lousy Mr. Flash crap.
This space available.
Sandisk Compact Flash is as good as anyones. It is their SD flash that is terrible.
jpl:~ sokeefe$ ssh spirit
/tmp/ramdisk0 /dev/ram0 /dev/ram0 /tmp/ramdisk0
sokeefe@spirit's password:
Today is Prickle-Prickle, the 24th day of Chaos in the YOLD 3170
Welcome to Spirit!
spirit:~ sokeefe$ mkdir
spirit:~ sokeefe$ mke2fs
spirit:~ sokeefe$ mount
spirit:~ sokeefe$ dd if=/dev/flash1 of=/tmp/ramdisk0
spirit:~ sokeefe$ reboot
Connection to host lost.
sokeefe@jpl:~ sokeefe$
Flash RAID array.
(Can this even be done?)
Weeks of coding saves hours of planning.
If you don't get it on cable, you can watch NASA TV here.
Litigious bastards
JPL, Pasadena -- January 21, 2004
Ground controllers were able to send commands to the Mars Exploration Rover
Spirit early Wednesday and received a simple signal acknowledging that the
rover heard them, but they did not receive expected scientific and
engineering data during scheduled communication passes during the rest of
that martian day.
Project managers have not yet determined the cause, but similar events
occurred several times during the Mars Pathfinder mission. NASA scientists
suspect, however, that the re-use of AOL floppy disks to hold the
mission-critical computer software may have contributed to the failure. A
scientist who requested anonymity said "We were just trying to save a little
money. We should have stopped using them after the shuttle disasters."
Full details on the rover's status will be described in the next daily news
conference Thursday at 9 a.m. Pacific time at the Jet Propulsion Laboratory,
which will be broadcast live on NASA Television.
Eh ... I always carry a couple of flash cards when I go out on a photo shoot with my digital camera (on Earth). If I was sightseeing on Mars I'd go whole hog and bring half a dozen in my kit. The air fare is so ungodly expensive.
Everbody knows these things crash every now and then and you need to carry spares. I hope Spirit is set up so it doesn't just have one flash RAM device. I'd hate to loose those pictures.
Was the Flash Memory made by Simpletec?
Okay now for the back story so fellow Slashdotters are not comfused. I work for a Consumer Electronics Manufacturer, and our device uses Multimedia Memory Cards. Well we had to design our own MMC cards because those produced by SimpleTec fail way too often for our quality controls. Our Techs are always fielding calls about applications closing due to poor memory cards.
100 days uptime is "incredibly stable"? *snrrk*
Don't write code that fucks with other
processes memory.
What is the OS supposed to do? "Hey
driver that controls the communcation
chip, you die !!!". Then what?
No link to earth.
Just write good code, test it, review it.
with all the radiation and very high energy particles zipping thru the spacecraft on its way there, I'm suprised any computerized spacecraft get anywhere intact.
"It's so convenient to have a system where everyone is a criminal" - A. Hitler
"0354 GMT (10:54 p.m. EST Sat.)
Opportunity is currently 8,268 miles from Mars, traveling at 7,758 miles per hour. The craft's speed will continue to increase as the Martian gravity pulls Opportunity to the planet."
Right on target. To be confirmed by bright flash in few minutes.
I doubt that we will ever figure out - and I suspect that even if we did figure out we couldn't do much about it
Mission control feeling effefcts of /.'ing
Comun icationss di tvie
[No Carrier]
Oh, and he has another quote I liked too:
But maybe I just like it because thats how I tend to fix things too
Those who sacrifice security to condemn liberty deserve to repeat history or something. - Benjamin Santayana
Couldn't NASA just hit the "Skip Intro" button in the corner to avoid all these problems?
it was paraphrased!
also..don't forget duct tape and WD40 (or a paper clip and a straw for MacGuyver).
The Martian probe "Spirit" has been captured and subverted by the Martian vampires. They have now inserted their virus code into the "debug" data received from Spirit. What security measures has NASA applied to the tainted data?
Time will tell if the vampire digital virus does to computers what their live virus does to humans. When the Net is swarming with zombie processes that can only be killed -SILVER, the biters' shadow will have fallen across our whole planet. Meanwhile, work on the SOLASER (Solar Optical Light Amplification by Stimulated Emission of Radiation for Killing Vampires) proceeds at a feverish pace. Flooding Earth's corrupted fiber veins with the beneficient bandwidth of the Sun might just keep us out of their icy clutches, and a longrange transorbital freespace burst might even take them out decisively. If you're pulling allnighters in a NOC, or have some surplus SDI gear, please volunteer for betatesting the SOLASER. Stake 'em & bake 'em!
--
make install -not war
Your six minutes of hell are nearly over.
This is one of the funniest things I've ever read on Slashdot.
If Jesus wants me it knows where to find me.
The system is rebooting no matter which flash memory is being accessed, it has the same bug both ways, so the flash ram itself looks to be OK, but the interface between the flash ram and the software looks to be causing resets ...Even if there were more backup flashrams, it looks like they'd still have this problem... ...But then sending two rovers would also negate problems, and thats just what they've done
Except, isn't the flash interface mechanism identical on Opportunity? Is this a design flaw or a flawed instance? I wonder how they will circumvent this same scenario with Opportunity...
Besides robot exploration software would be handy right here. It would be neat to be able to send a research bot out in the deserts, deep oceans and jungle canopies of the world. Machines can go where we can't.
Individually you can be damn annoying sometimes, but I'm constantly amazed and delighted by the collective intelligence of the /. pack.
That's our life, the big wheel of shit. - The Fat Man, Blue Tango Salvage
The Opportunity spacecraft has stopped bouncing and has come to rest on the surface of Mars safely! At this time, it appears that the landing was flawless and everything occurred as expected. Al Gore and Arnold witnessed the event with NASA from the JPL. After landing, the spacecraft bounced and then rolled from several minutes (the extended roll was due to the flatness of the landing site.) Initial diagnostics performed by the spacecraft detected no faults.
but they both fail so they think it might be the io channel or the io bus that is damage.
You have two or more running in parallel. While one is running, the next reloads from ROM. When it's loaded and synchronized, you switch to it, and load the next one. You do that in series, over and over, so you're only using any particular FPGA for a couple of seconds at a time, and their configurations are constantly being refreshed. It's a very simple idea that can be done now.
--- Ban humanity.
Opportunity has confirmed landing :)
wahooo
We should begin to designate all sites of landers and moon landings as historical sites. No entry within some distance. Otherwise some kid will stomp man's first footprints on the moon out of existence. All the landers on mars will be recycled into local products for the first colonists. History is now and for the future.
When I was young, I had to rub sticks together to compute.
I just notice my Mars rover also has a bad flash. Anyone else? I smell a class action lawsuit on this one.
Sincerely yours,
Colin
OK, you dorks (you know who you are) need to stop postulating about the memory failures having to do with static electricity, martian dust, or lack of redundancy. This is JPL and (the one case of metric vs. standard aside) they thought of all the obvious stuff during the design stage. Do you really think they're slapping their foreheads and saying "the dust! we forgot about the dust!" over in the design lab? Get real, people.
If a job's not worth doing, it's not worth doing right.
it has apparently landed on its side petal. That's okay, because it was designed to flip itself to an upright position no matter how it lands.
-
[Douglas Quaid seeing his real personality on the screen]
Douglas Quaid: Now get your ass to Mars.
I couldn't believe my ears when one of the guys at the 12:00 PST breifing said: "It's a *S*oftware *H*ardware *I*nterface *T*hing"
I should have added MOL insurance to the Martian Automobile Association brochure.
Those MOLiens get the IntergalacticNet so screwed up, their email doesn't work right, IE is messed up, and when they installed MOL on the rover it started saying:
"You've Got Rocks!"
Saskboy's blog is good. 9 out of 10 dentists agree.
It is likely that we don't even need flight to transcend humanity. Isolate the area of consciousness in the brain, and put it in charge of a silicon computer. Then it's trivial for a single highly protected, highly intelligent sentient being to go to Mars (if he would even want to).
-Libertarian secular transhumanist
Now that opportunity has arrived safely maybe it can run over and hit control-alt-delete for the poor little thing? (will we really ever know the truth that Spirit is suffering a case of BSOD?)
One more and they'll have Huey, Dewey and Louie..
Of course the politicians showed up. If the first one failed, do you think they'd have been there for this one?
What they should have done is tested the ram on earth
Are you kidding? You cannot possibly be serious. They should have tested it? This is freaking NASA. There is no doubt in my mind that every piece of flash ram was tested 50 gazillion times and simulated more than that. They test everything that goes through there more than most people would, and with good reason (i.e., their stuff has to work in space). I remember reading that it takes em like a couple years to get a new meal approved to go up, to test it out. *Years* for a freaking meal, and you think they just sent up some random flash card they bought at best buy?!?! I doubt it.
Plus, if you had bothered to read most (or any) of the reports out there, you'd see that it doesn't seem like it's defective flash ram; it's more likely the interface between the software and that ram.
I hate to be mean, but god you're a moron. The fact that anyone would moderate your drivel as "insightful" is just scary.
"Hard real-time software cannot afford to have a complex layered structure and lots of conditional code that adds unpredictable delays."
Vxworks is the worst of both worlds, not only does it not have the memory protection (unless you use AE), but its "hard" real time performance is not so good because system calls cannot be interrupted.
A modern operating system such as INTEGRITY has excellent latency characteristics as well as memory and CPU bandwidth protection...
BTW I am not a salesman for Green Hills, just a humble engineer who recommended them for his embedded PPC project only to be overruled by non software people and ended up with VxWorks... which by the way has done it's job on this project very well. I would still be keen on trying something a little more up to date but overall WRS has come to the party with most major issues, the only problem is you have to ask, and know what to ask, to get a result. They are just as clueless/lazy when it comes to finding problems as you are... unless you are NASA of course.
We never bought FFS though, maybe that was a good decision! hehe
I bet they wish they took at an on-site warranty on that Flash Memory now.
Saviour of the Universe!!
[with apologies to 1930s movie serials and Queen]
Donte Alistair Anderson Roberts - hi son!
Karma: Chameleon
If construction was anything like programming, an incorrectly fitted lock would bring down the entire building...
That's even worse than talking about man-uals and other sexist remarks, the insensitive clods!
I note also that RAM is a male sheep. Why not EWE? Because NASA is full of women and disabled person oppressors that's why!
Pull their funding and disband NASA.
The Machine stops.
I was going to mod this one up, but I decided to give this reply some more emphasis by actually replying with some thoughtful encouraging words instead.
It would be nice to be able to have some folks at JPL throw down the source code and engineering schematics and say to the geek/space/engineering community at large "We have a problem here and could use your suggestions to see if we can get this fixed."
This (the mars missions) is obviously a big hit, as measured by replies on Slashdot, the number of hits on the website at JPL, stories in mainstream media, and other reasonable metrics to gague popularlity of a project. I'm sure that there are several geeks out there that wouldn't mind digging into the source code.
The only reason I could see the engineers not wanting to do that is to open themselves up to obvious scrutiny for poor engineering and coding. (Whadda you mean the global variable named temp is the only variable. We also have temp2, temp3, and temp4. What do the numbers mean in those mean? You can get it from context, can't you?) That and some people just aren't used to allowing other into their "domain".
Being 100% funded by public money should also be further reason for why this should be opened up. I also totally agree.
Speculation is one thing, diagnosis and debugging is another. /. may be full of kibbutzers, but not a damn one us who doesn't work for NASA did a thing, really. In the end, even those who hit it "right on," what did they actually know about the situation? Next to nothing, and this means that their solution was a guess. Just because the guess was correct does not mean anything merit-worthy was done by the guesser.
Nasa engineers, 1: Slashdot, 1/2 (a lucky guess is worth something, I suppose)
blog
Sounds a lot like safe mode to me. Let's hope it works a hell of a let better than safe mode or we're screwed.
As someone who has programmed VxWorks (including AE) for several years, I can say AE is a buggy piece of crap. We moved to AE for our project and eventually had to dump it since it was so buggy and slow. Also, as far as flash filesystems go, VxWorks ONLY SUPPORTS FAT, and not even FAT32, so it isn't a very robust filesystem. Not only that, because it's FAT there is no wear level support. I believe there also isn't the equivelent of chkdsk either. I also imagine that it can't handle faults in the filesystem (as if anything ever could deal with faults in a FAT filesystem very well).
With VxWorks you can often get away without any filesystem because all the code is linked together in one big monolithic file. Separate tasks are not separate files (although you can have loadable object files).
Yes, AE does provide memory protection domains, but it still doesn't clean up after a task dies. Sure, you can free the memory, but not open files, semaphores, pipes, or other things. Malloc in AE is improved over the braindead implementation in standard VxWorks, but it still has a long way to go. For example, it can't free up open file descriptors, semaphores, or other items associated with a task because a task usually isn't associated with it. So if you have a task that acquired a semaphore and dies, that semaphore will never be released.
Hell, Wind River couldn't even get malloc right! Their malloc has got to be the worst implementation I've ever seen! They place free blocks in sorted order (smallest to largest) in a linked list after attempting to combine a new free block with neighboring free blocks. The next time you allocate, it walks the entire linked list until it finds a block large enough! In our case we wound up with tens or even hundreds of thousands of small blocks causing our watchdog timer to kick in because malloc became impossibly slow. AE improves this to use a tree instead of a list, but it still fragments. I ripped out the Wind River implementation and replaced it with Doug Lea's dlmalloc and all our malloc problems were solved, and the fragmentation went from tens of thousands of fragments to only a few dozen.
For an RTOS being pushed for networking it isn't very good there either. It comes with an ancient BSD TCP/IP stack. If you have a device and want to see if it runs VxWorks, just run nmap against it. If it says TCP sequence number guessing is trivial, you can bet it's probably running VxWorks.
In todays world, VxWorks doesn't cut it any more. Any complex project should choose a real OS like QNX or even embedded Linux over VxWorks. For realtime, Linux usually isn't very good, but Timesys appears to have solved that problem nicely.
VxWorks isn't even that good at realtime. Usually you can't get any better resolution than half the system tick rate (usually 10ms), so you can't get better than 20ms of resolution in many cases.
I've also heard many rumours that Wind River is dropping AE, or at least not pushing it. We're not the only ones to have been burned by it. I've heard of only one other company that used it, and they were also burned. I think it was a startup that went out of business.
In VxWorks, all tasks share the same memory space. Think of every "task" as really a thread and you get the idea. In other words, if a "task" dies, the only way to clean up the system is to reboot.
Also, VxWorks doesn't scale. The more tasks you have, the slower it runs (i.e. no O(1) scheduler). And with the shared memory, the more complex the code, the harder it is to debug and develop a stable system.
QNX would have been a much better solution. In QNX, the core OS is very small, and if a task dies it can easily be restarted. In QNX, everything is a task with memory protection. The TCP/IP stack is separate from the core OS, for example, as are all the other drivers. If a driver crashes, it won't take the OS with it. Context switching in QNX is also very fast, faster than VxWorks even though memory protection is involved.
-Aaron
This post is encrypted twice with ROT-13. Documenting or attempting to crack this encryption is illegal.
So my roommate worked for an engineering firm, and his job was designing an interface for a piece of circuitry that used flash memory to hold its programming. He sent some data to the circuit, and the thing stopped responding. He checked it inside and out, looked at the data he sent, checked the whitepapers about the electronics, and came to the conclusion that, because he had sent improperly-formatted data to the circuit, the little controller chip freaked out and deleted everything on the flash memory. He didn't send "delete everything please" or "wipe the flash" data, just something like "0225" when the chip was waiting for "225". If your circuitry isn't robust enough to withstand improperly-formatted data to a very small degree, you should probably redesign it to be more reliable.
Not that NASA did the same thing; when NASA builds a piece of equipment to last, it almost invariably does. That only 17 people have died in the thousands of launches and recoveries the space program has been through is a testament to the engineering abilities in the organization.
I can tell you that AE is in many ways WORSE than the standard VxWorks. It has a lot more bugs and is quite a bit slower. Think of regular VxWorks with memory protection hacked in, not designed in from the ground up.
As a VxWorks programmer for the last 5 years, I can honestly say VxWorks is a PoS that is losing market share at a tremendous rate to the likes of embedded Linux and QNX. Wind River decided to spend tons of money buying add-in products like Routerware instead of improving their RTOS. It was a huge waste of money and now they're paying for it. They're losing money hand over fist and have had a lot of layoffs lately. They were good at one time, but they have fallen far behind the curve now in embedded RTOS design, especially for complex systems.
VxWorks comes with support for a FAT flash file system, a completely broken malloc implementation, an ancient BSD TCP/IP stack, poor RT support, no memory protection, and no way to clean up after a task that dies. Not only that, it usually costs a fortune, but I've heard they're willing to sell it very cheap now because they're desparate.
I looked into embedded Linux for our next generation hardware and software and Timesys appears to have a very nice solution with hard real-time support. The kernel is fully preemptable using semaphores instead of spinlocks and has priority inversion support. They also offer resource reservations, so I can say "I want this task to be guaranteed 5.73ms of execution time every 9.8ms" where after 5.73ms the task either gives up the CPU entirely, or else changes to a non-RT priority to not starve other tasks. It's really quite clever. Not only that, unlike RT-Linux there isn't a separate API for RT vs non-RT tasks. Monta Vista Linux is soft real-time. It cannot guarantee context switching time, nor does it deal with priority inversion. In RT, priority inversion can be a major problem (see the first Mars rover for an example).
For an example of priority inversion say you have 3 tasks, a low priority, medium priority, and a high priority. The low priority task acquires a mutex semaphore to protect a critical section and starts processing. It is interrupted by a medium priority task. Meanwhile, a high-priority task unblocks and attempts to grab the mutex. The high priority task will block until the medium priority task blocks so that the low priority task can release the semaphore. A common solution is priority inheritance. With priority inheritance, as soon as the high priority task attempts to acquire the mutex semaphore, the low priority task has its priority bumped to that of the high priority task until it releases the semaphore. In this way, the low priority task will interrupt the medium priority task so that the high priority task won't have to wait as long.
QNX is also a very good alternative. Very fast context switching and extremely robust memory protection. I think with QNX you can even buy a license suitable for use in medical devices (i.e. you absolutely cannot afford to have the OS crash for any reason).
I've heard rumours that Wind River is dropping AE since nobody is using it. After our experience, I pity whoever tries it.
Also, unless you get the source to VxWorks, which usually costs a lot of $$$, debugging is a complete nightmare, especially when you hit a bug in the Wind River code (and there's a lot of them). Hell, they couldn't even implement malloc right!
Wind River is coming out with version 6 of VxWorks, but it is basically an enhanced version of AE. I'm not holding my breath.
-Aaron
This post is encrypted twice with ROT-13. Documenting or attempting to crack this encryption is illegal.
You should always trust the results from the more powerful female computer.
There goes my theory of an alien changing inittab to make it to boot to runlevel 6.
Everyone is born right-handed; only the greatest overcome it
Interesting stuff.
Just one correction, VxWorks now sells its source code quite cheaply as long as you are in the present licence model(Which unfortunately cost the earth).Not as cheap as linux however...
Choose your allies carefully, it is highly unlikely you will be held accountable for the actions of your enemies
NASA engineers reported on Monday that they reproduced in their test facility the same error that hit the Spirit Rover last week. Basically, the file system didn't have any proper limits set and had created too many files, leading to a situation where the software would gag. They are now deleting older files such as those generated while traveling to Mars. see: http://www.nytimes.com/2004/01/27/science/space/27 MARS.html
======= ~\_/~\_O Burmese
The conventional wisdom is "a production system doesn't have bugs so we don't need to protect against them". They obviously never use their own chip designs. It takes so much longer to ever get the damn system working, and when it works you still don't know if you're just being lucky with memory corruption. I'm really going to have to shoot the next manufacturer I see quoting that memory protection is unnecessary. Unfortunately, embedded chip manufacturers all seem to have forgotten that development time needs to be factored into cost.
First off, on a modern PowerPC processor, memory protection (that is, without virtual memory support) can be implemented very cheaply. If you can do it just with the IBAT/DBAT registers, it should be a constant-time overhead, which is good enough for hard-real time. Oddly enough, I can't find a single reference on the net that measures the cost of memory protection alone on a modern CPU. Anyone? Anyone?
I don't know about PowerPC, but a typical ARM MMU has a very small overhead, especially if you use large page sizes. From memory, there's 1 page directory (or large page) per 4MB of address space, which points to a page table containing 32 bit entries for each page (4KB or 64KB pages). There's something like 32 TLBs, and lookups are done in hardware straight from memory.
So, every time there's a TLB miss, there's an additional latency of two 32 bit SDRAM reads. You can put the tables in internal SRAM (if you have it), which means it only takes a few extra clocks on a TLB miss. TLB misses generally aren't an overhead in my experience - you're going to be dwarfed by the cache miss cost anyway.
Basically, anyone telling you an MMU has overhead is talking out of the wrong orifice. There are very, very few things which would ever need better latency than a couple of cache misses and TLB misses. If you need that kind of low latency, it should be done in hardware.