Debugging The Spirit Rover

Oh, sure... by inertia187 · 2004-02-21 17:43 · Score: 5, Funny

Are there lessons here that we can use here on the third rock for recovery of our messed up machines which we manage from afar via ssh?

As a former co-worker (hi, jwalker!) used to say when people tried to draw ridiculous analogies, "It's exactly like that...only different."

--
A programmer is a machine for converting coffee into code.

Re:Oh, sure... by kraksmoka · 2004-02-21 18:06 · Score: 1, Troll

actually, i had a saying with my high school sweetheart that would better express this idea. . . . .
its like being on a sea cruise, but different!

--
"You never want a serious crisis to go to waste." - Rahm Emanuel
Re:Oh, sure... by JWSmythe · 2004-02-21 19:25 · Score: 4, Insightful

It sounded like the same type questions non-technical bosses always ask about technical matters.

"We're ordering this brand new hardware that you've never tested before. Can you guarantee it will never crash?"

"Will this database server handle the load of our brand new project?" (without an accurate growth estimate)

"A server 2000 miles away just went down. What happened?" (no ping, no nothing) Hmmm.. Power/NIC/CPU/CPU fan/hard disks?

It really sounds like they did some decent advanced planning on those probes, but from other stories I read, the were shooting for 90 days of reliability, which in itself was a hard one to do. What if it turns the antenna the wrong way and looses connectivity? What if it gets hit by lightning? What if it falls in a hole? (go Beagle!)

Sure, relate this to your web server colocated somewhere you're not. Cross your fingers, hold your breath, and hope there aren't a few fatal systems failures, or a bit of human error. I've been responsible for a bit of that in the past, but at least my equipment wasn't a few million miles away.

--
Serious? Seriousness is well above my pay grade.
Re:Oh, sure... by FrostedWheat · 2004-02-22 01:05 · Score: 3, Informative

What if it turns the antenna the wrong way and looses connectivity? What if it gets hit by lightning? What if it falls in a hole? (go Beagle!)

There is a low gain omni-directional antenna that can be used as backup. Infact I think they use it most of the time for commands and just use the high-gain for data transfer back to Earth. Which makes sense, they never need to send large amounts of data to the rover.

No lightning has ever been detected on Mars. Tho it's not impossible, it is very very unlikely. No proper observations of the night side of Mars has been done tho, so they may just be missing it.

And Opportunity did fall into a hole :)
Re:Oh, sure... by sjames · 2004-02-22 03:22 · Score: 3, Informative

Actually though, it's not too bad an analogy. While Earth based servers aren't absolutely unreachable like SPirit, they are often remote, and there are expenses associated with visiting them in person.

Various schemes now exist to help deal with that. Many boards have a small management processor (bmc, server management board, IPMI, whatever) that is used for remote diagnostics and reconfiguration when the main board won't even boot.

Meanwhile, LinuxBIOS supports two complete BIOS images. One 'old reliable' that once working is never changed, and one that can be upgraded freely. Coupled with a watchdog card or timer, it's decently managable in the field. That work is continuing.

Meanwhile, IBM is pushing the 'blue button' that forces a software reload from an image partition.

In that sense, the problem is strongly analogous. Most of us will not, however, encounter the exact problem that Spirit had, though some embedded device developers just might.
Re:Oh, sure... by Anonymous Coward · 2004-02-22 04:12 · Score: 0

In that sense, the problem is strongly analogous. Most of us will not, however, encounter the exact problem that Spirit had, though some embedded device developers just might.

Who are these embedded device developers, and why are they millions of miles away from their devices?
Re:Oh, sure... by sphealey · 2004-02-22 08:37 · Score: 1

No lightning has ever been detected on Mars. Tho it's not impossible, it is very very unlikely.
Wow, that's interesting. I would have thought with the monster dust storms on Mars that there would be some lightning. Do you have any links on the lack thereof?
sPh
Re:Oh, sure... by dougmc · 2004-02-22 08:41 · Score: 1

Actually though, it's not too bad an analogy. While Earth based servers aren't absolutely unreachable like SPirit, they are often remote, and there are expenses associated with visiting them in person.
If you're thinking of traditional servers, sure. But the problem with Spirit is hardly unique. Satellite fail, and often people down on the ground are trying to repair them remotely just like was done with Spirit. Or suppose you're exploring the Marianas trench with some little remote controlled submersible and something goes wrong -- if it can't just float up on it's own (hopefully they usually have slightly positive bouyancy), it may very well be lost if you can't find a way to get it at least somewhat repaired.
In theory, you could retrieve these things, but the cost is probably too much to make it worthwhile.
Re:Oh, sure... by sjames · 2004-02-22 10:12 · Score: 1

When you've been paged out of bed by a screwy router on a chilly winter night, it SEEMS a million miles away.
Re:Oh, sure... by FrostedWheat · 2004-02-22 10:14 · Score: 1

Here is one page on the subject.

There probably is some lightning in the dust storms, but nowhere near the scale we get here on Earth. Although large, the dust storms are still very thin.

From what I can tell none of the probes are looking, but I'd say if it was a common phenomenon then it would have been detected already.
Re:Oh, sure... by sexecutioner · 2004-02-22 12:04 · Score: 1

Sorry but the idea that no photos of Mars at night have been taken is a little silly.

If you look at Mars from Earth then the majority of the time you will see a cresent. This is due to the geometry of the Sun/Earth/Mars system; have a think about it. The dark part of the cresent is (surprise, surprise) night time on mars.

This also applies to the moon. So the next time someone mentions the "dark side" of the moon ask them what side that will be during a solar exclipse. Oh, that's right, the one facing the sun in broad daylight.
Re:Oh, sure... by Anonymous Coward · 2004-02-22 14:00 · Score: 0

What if it turns the antenna the wrong way and looses connectivity?

Then they'll just have to turn the antenna the other way and tighten the connectivity!
Re:Oh, sure... by Anonymous Coward · 2004-02-22 14:05 · Score: 0

Nope. I have to say your version sucks cock.
The original was much different.

BTW, I bet you never fucked your 'sweetheart', especially if you're using teh ghey name for her.
(Sick fuck your aunt is, she probably likes it.)
Re:Oh, sure... by FrostedWheat · 2004-02-22 22:26 · Score: 1

If you look at Mars from Earth then the majority of the time you will see a cresent.

Only two planets ever show a cresent as viewed from Earth. Mecury and Venus. From Earth, you never ever see much of the night side of Mars. You may see the morning or night terminator plus a little bit of shadow. Not enough to do a proper study.

Here is about as much of the night side of Mars you'll ever see from Earth.

Local Debugging by webmaestro · 2004-02-21 17:44 · Score: 3, Funny

Man, I have a hard enough time debugging programs running on my local machine.

Re:Local Debugging by srichand · 2004-02-21 17:56 · Score: 5, Funny

In other news stories, the Microsoft Corporation decided to sue NASA, apparently since the right to crash systems was only theirs. Not to be left behind, SCO insisted that the code that caused the failure was unethically copied from their source repositories. This has indeed caused a flutter in the space communities
Re:Local Debugging by Anonymous Coward · 2004-02-21 23:16 · Score: 0

Words can't express how much you suck. That was the worst joke anyone has ever posted.
Re:Local Debugging by Anonymous Coward · 2004-02-22 01:28 · Score: 0

But it had all the right ingredients; Microsoft & SCO are bastards. Keeps the fanboys happy while they pretend they can contribute to a discussion on NASA hardware. Here's a hint: They don't fuck about with Gentoo ...
Re:Local Debugging by Anonymous Coward · 2004-02-22 04:15 · Score: 0

I hope this doesn't start a new trend of outsourcing the jobs of debuggers!

Ooogh, I really don't want to have to move to Mars to get a job, but in this economy... :/

I dont know about learning much.... by detritus` · 2004-02-21 17:45 · Score: 4, Funny

I dont think i want to learn too much from this as the solution was the equivalent of rm -rf... On a side note i wonder when the 40 min ssh delay jokes will begin again

--
drunk chemists

Re:I dont know about learning much.... by BlueTrin · 2004-02-21 17:51 · Score: 1

well i don't know about these jokes, but my Beowulf cluster has a delay of (x machines) x 40 minutes

--
Don't you know it is now both immoral and criminal to think beyond the next quarterly report?
Re:I dont know about learning much.... by MrBlue+VT · 2004-02-21 19:15 · Score: 1

Eh, it's not like they were storing critical data files in that flash memory, it was a few pictures and other scientific data. It sure beats losing the whole rover.
Re:I dont know about learning much.... by Bill+Privatus · 2004-02-22 14:14 · Score: 1

If it was pr0n, someone should lose their job....that's a work machine!
Even if it was SouthernCharms!

--
Redundancy is good; triple redundancy is twice as good! - Me.

well by whackco · 2004-02-21 17:45 · Score: 4, Funny

at least it wasn't a blue screen?

Re:well by Anonymous Coward · 2004-02-22 06:34 · Score: 0

LOL if it was running windows XP it never would have had this problem ad windows XP is a superior product to any *nix

Like this? by The+Human+Cow · 2004-02-21 17:45 · Score: 4, Funny

man rover?

--
The Human Cow - bringing you scrumtrelescence since 1995

Re:Like this? by toygeek · 2004-02-21 19:48 · Score: 1, Funny

Man Rover? What does this article have to do with Richard Simmons?

--
Nobodies Prefect
Tidbits for Techs Technology Blog

Remote debugging? by Nimloth · 2004-02-21 17:47 · Score: 4, Funny

I don't get it, couldn't NASA afford the on-site warranty?

Re:Remote debugging? by sysbot · 2004-02-21 18:06 · Score: 0

Nope.
Re:Remote debugging? by kfg · 2004-02-21 18:08 · Score: 5, Funny

Yeah, but they thought they could save a few bucks and got the Gateway consumer version.

"Oh, you've got the on-site warranty, huh? Ok, first thing you have to do is ship it to South Dakota. . ."

Oh, hey, looks just like Mars.

KFG
Re:Remote debugging? by Anonymous Coward · 2004-02-21 18:17 · Score: 0

Shipping it to South Dakota isn't going to help. Gateway moved to California in 1998.
Re:Remote debugging? by operagost · 2004-02-21 18:25 · Score: 2, Funny

When you get the on-site warranty, make sure they tell you WHICH site!

--

Gamingmuseum.com: Give your 3D accelerator a rest.
Re:Remote debugging? by kfg · 2004-02-21 18:32 · Score: 1

No. The executives moved to California. The schlubs are still in South Dakota. You think they're going to pay assemblers San Diego wages?

Gateway Factory

Although bits of California look like Mars too, except for the funny color of the sky.

KFG
Re:Remote debugging? by kfg · 2004-02-21 18:36 · Score: 0

Sometimes I learn things the hard way.

KFG
Re:Remote debugging? by MegaFur · 2004-02-22 04:11 · Score: 1

Actually, if you have on-site warranty, you really can get a Gateway guy to show up, but he'll never do more than install a part. Really. Absolutely zero troubleshooting. Don't even expect him to notice or plug in an unplugged power cable.

Also, do expect to have to wrangle a bit with the tech on the phone before they'll actually send someone out onsite.

--
Furry cows moo and decompress.

lots of mem of an embedded system by millette · 2004-02-21 17:49 · Score: 4, Funny

Wow, I didn't expect the rover had 128MiB of RAM, or 256MiB of flash. Funny to think they had to run chkdsk from so far away :)

Re:lots of mem of an embedded system by clifgriffin · 2004-02-21 18:00 · Score: 1

chkdsk is windows.

I doubt they are running windows.

--
clifgriffin > blog
Re:lots of mem of an embedded system by You're+All+Wrong · 2004-02-21 22:05 · Score: 3, Informative

Vx-Works

A highly respected embedded OS.

YAW.

--
Your head of state is a corrupt weasel, I hope you're happy.

Space Technology by superpulpsicle · 2004-02-21 17:49 · Score: 5, Insightful

That's the thing that amaze me. Any technology having to do with space seem that much more advanced.

Here on earth we can't even build cars that require no maintainance and last more than 10 years.

Re:Space Technology by Naffer · 2004-02-21 17:51 · Score: 1

Do the people building the cars want them to last 10 years with no maintinence? Dealerships make wads of cash in their auto shops.
Re:Space Technology by Anonymous Coward · 2004-02-21 17:53 · Score: 1, Funny

I couldn't afford a car that NASA built... :P
Re:Space Technology by Anonymous Coward · 2004-02-21 17:53 · Score: 2, Insightful

Yeah offer to pay $800 million for a custom built car, and you can bet it will last 90 days too.
Re:Space Technology by Anonymous Coward · 2004-02-21 17:56 · Score: 0, Offtopic

Here on earth we can't even build cars that require no maintainance and last more than 10 years.

Sure we can. It's just that nobody does, because making completely reliable, long-lasting products is not good business.

If your products die one day after the warranty is up, or if they last forever, you kill your repeat business. Companies have to strike that happy medium to keep people coming back for more.
Re:Space Technology by beeplet · 2004-02-21 17:57 · Score: 5, Insightful

Actually any technology making it into space is more likely to be 10 years out of date... Getting anything certified for space is a long process. The technology in space isn't more advanced, just much better documented and well-understood.
Re:Space Technology by Billly+Gates · 2004-02-21 18:02 · Score: 4, Insightful

The Japanese started that.

They make alot of money from loyal customers. But I admit my 13 year old 91 honda civic with 140k miles is getting on my nerves with repair costs. WOuld a 91 ford escort still be running today? I think not.

I will buy only Toyatas and Honda's for that reason.

It amazes me consumers are too stupid to read consumer reports and buy cars on looks. Repair costs for things like Cadallacs and BMW's are not cheap for TCO! Yes consumer products have TCO too and we and not just businesses should look at that as well.

--
http://saveie6.com/
Re:Space Technology by Anonymous Coward · 2004-02-21 18:03 · Score: 0

Until you realize that an MS-DOS computer in that same situation wouldn't have crashed.
Re:Space Technology by kfg · 2004-02-21 18:12 · Score: 5, Insightful

Ten years out of date, but ten years more reliable for the effort.

Sort of like Debian.

Cutting edge ain't always what it's cracked up to be.

KFG
Re:Space Technology by sangreal66 · 2004-02-21 18:18 · Score: 0, Redundant

My '89 temp runs great...
Re:Space Technology by kfg · 2004-02-21 18:21 · Score: 4, Interesting

No. You can't make a mechanical device like a car that requires no maintainence. Bearings wear out. Hoses and belts have a limited lifespan even you never drive the car, etc. This is the real world. We will obey the laws of thermodynamics. Entropy always wins.

What you can do is make it require less maintainence, make that maintainence cheaper to perform, and make the car last until you hit something really hard so long as you maintain it. You should be able to hand your car down to your kids.

Other than that you're bang on though.

I wonder what we can learn from that about maintaining our computers?

KFG
Re:Space Technology by afidel · 2004-02-21 18:21 · Score: 1, Informative

Just gave a 93 Ford Taurus to my brothers fiance, runs great and in the 5 years I owned it I had to replace a seal on the radiator and that was IT other than oil and gas. My current car is 99 Taurus with 158K miles and I haven't put a dime into it other than oil and brakes, need to do spark plugs as the fuel economy has gone down this winter and that's the most likely cause =)

--
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
Re:Space Technology by Kurt+Russell · 2004-02-21 18:23 · Score: 0, Redundant

WOuld a 91 ford escort still be running today? I think not.

And why not? I have a 90 mustang 5.0 with 168k on the clock. My girl has an 88 grand am with 220k..;-)
Re:Space Technology by Anonymous Coward · 2004-02-21 18:31 · Score: 0

They make alot of money from loyal customers. But I admit my 13 year old 91 honda civic with 140k miles is getting on my nerves with repair costs. WOuld a 91 ford escort still be running today? I think not

I will buy only Toyatas and Honda's for that reason.
Overrated--- mod this bullshit down. ALL CARS break down. Sheesh..
Re:Space Technology by Biogenesis · 2004-02-21 18:36 · Score: 1

Maybe a 91 ford escort woulden't be running, by my 91 ford falcon sure is. and with ~260Mm's (convert to miles yourself, i'm too lazy to use google) on the clock it's still going strong with only a bi-yearly service.

--
...just so Google finds it.
Re:Space Technology by Chrispy1000000+the+2 · 2004-02-21 18:45 · Score: 0

Well, as to dubunk your theory about fords, in general, I call into the following evidence:

My car is 1993 Ford Topaz and it's still running! Mind you, the E-brake, horn, normal brakes, RPM gauge, door locks, seatbelts, transmission, ball joints, cam shaft, air conditionong/heating pannel, fan ducts, block heater, key thinge, and throtle cable all need a little work, but it still runs!

And It's a standard to boot, can spin tires on really, really slipery ice, and can get to 115kph with a good 30k strech, and a lot of shaking.

Can you tell that I hate my car?

--
Sig
Re:Space Technology by Kymermosst · 2004-02-21 19:14 · Score: 1, Interesting

I have a 90 mustang 5.0 with 168k on the clock.

I've got a '70 Mustang with 190K on the clock. Ran fine before I took the engine out. (It needed new head gaskets and the intake manifold was cracked, but it ran well).

--
"Alcohol, Tobacco, Firearms, and Explosives" should be a convenience store, not a government agency.
Re:Space Technology by Anonymous Coward · 2004-02-21 19:18 · Score: 0

Jeez, that was a real half-hearted attempt at a troll. Come on, you can do better than that!
Re:Space Technology by Anonymous Coward · 2004-02-21 19:47 · Score: 0

Crackhead moderators, I respond to a post saying basically that only the Japanese make reliable cars by stating that I have similar American made cars with no need for " is getting on my nerves with repair costs."
Re:Space Technology by alwaystheretrading · 2004-02-21 19:48 · Score: 3, Funny

Here's an example of the Mars Rover's 10 year old networking technology:
Ring, Ring, Ring....
"Welcome to the Mars Rover answering system. For English press 1, Para Espanol prensa 2"
BEEP
"You selected English. To leave a message for Spirit press 1. To leave a message for Opportunity press 2"
BEEP
"You selected Spirit. Transfering now." CLICK "I'm sorry, Spirit is unavailable at this time. To leave a message press 1. To return to the main menu press 2"
BEEP
"Hi this is the Spirit rover. I can't come to the phone right now but if you'll leave a message I'll get back to you." BEEEEEP
"Spirit, this is NASA. Please phone home when you get a chance. I think your fax machine has jammed and we need you to re-send. Thanks, bye"
Re:Space Technology by Anonymous Coward · 2004-02-21 19:57 · Score: 0

I used to work in a nuclear power plant.
it was the same situation, it might of been old technology, but it was reliable and proven technology.
Re:Space Technology by Enoch+Zembecowicz · 2004-02-21 20:05 · Score: 1

I understand that the Spirit and Opportunity rovers use the VME bus, which was used by Sun (and probably some others) in the 80s and early 90s.

--
"Who's going to believe a talking head?" - Herbert West
Re:Space Technology by Anonymous Coward · 2004-02-21 20:08 · Score: 0

I actually drive a 91 Ford Escort, and it still runs. Though the costs of repairs are going up too. So I guess its a global conspiracy.
Re:Space Technology by Anonymous Coward · 2004-02-21 20:17 · Score: 0

actually cars can be quite complex compared to space robots.
look into the engineering of something that we take for granted like seatbelts and you can see that cars have been evolving for quite some time, and will continue.
Re:Space Technology by Anonymous Coward · 2004-02-21 20:24 · Score: 0

My wife 92 mustand LX still run but like a train (at least it sounds like it) at 102k miles.

My Celica still run not as smooth as it was but you can hardly heard it and IT IS still smooth at 240k.

I have owned 2 1980 Mustangs ( the pacecar and a nonturbo) 1 1980 Mercury Capri and none of them ran over 100k. The reason I bought these (also recommended my wife in buying her Mustang) is because they are cheaper used! Those 80 cars when I went to college. One of my classmate bought a brand new 1984 Mustang and have it services 6 times before 1 year-Lemon law wasn't existed back then.

All in all I've seen a lot more Jap cars put on high mileage than American!
Re:Space Technology by Ishikawa+Goemon · 2004-02-21 20:34 · Score: 1

Okay, I have to bite...

I've got a well running 94 Escort. Not 91, but I'm sure it will be going strong in three years. 112k miles on it. I bought it coming off a lease at 31k, soon afterwards drove cross-country in it, and made numerous regional trips until my wife bought a probe that was a bit better for the long trips. Later her car was totaled (while parked...), so the escort became the trip car once again. Recently, I bought an explorer for my daily commute, and now my wife uses the escort for errands around town.

While not the nicest car built, it's been extremely solid.

I've never been one for routine maintainance, such as oil changes and tune ups. The worst I've done to it is drove it to work and back for a week with no oil plug. I had the oil replaced (at least 1k past due, as always...) and noticed the next day the car was making an aweful metal noise. I didn't connect the events, however, as I blamed the brake pads that were needed 6 mos. prior, figuring the metal was poking through the lining. After a fillup, it was purring like a kitchen again...

Other than that, one set of tires (maybe 1.5, but it's somewhat fuzzy...), the timing belt that was supposed to be replaced by 80k which I learned about at 105k, and a tuneup my wife snuck in a few thousand miles back, I've not touched it...

Now, I will certainly concede that this is probably the exception, not the rule, but I cannot let a bad word be said about Ford Escorts without telling my story...
Re:Space Technology by Zakabog · 2004-02-21 20:39 · Score: 2, Informative

I have a 90 ford mustang you insensitive clod. Still runs strong today, has like 107,000 miles on it and I'm sure it'd destroy your civic in a race ;-P. The only money I've really been spending is on a tune up, and new tires (old tires were crappy and leaking air.) And besides when someone buys a Cadillac or BMW (and god damn it it's Toyota, what the hell is Toyata) they don't care about the price. When you're going to spend $30,000 on a "cheap" BMW 3 series you're not gonna care that it's going to cost you x amount more than a cheap japanese car.

Cadillacs I don't really know too well, but I know a BMW doesn't need a whole lot of repairs. Most german cars are VERY well built. Much better than japanese cars too. And what good's a car that'll last you forever if you don't like the piece of shit in the first place. I just bought a new car (My mustang's in NY, my sister drives it now, my grandmother didn't like the idea of me driving across country in a 1990 Mustang with 300+ rwhp, on such long straight roads, top speeds 145 btw), I could have gone with a VERY cheap Honda Civic, it would probably last me most of my adult life but why would I want such a piece...? I bought a fully loaded 2004 Nissan Sentra SE-R SpecV, it's a quick car, with low insurance and great looks. I wouldn't have bought anything less, I didn't look into TCO at all, it didn't really matter to me. I don't want a car that'll last me forever if I don't like it. And most people let the dealer pick out the car they want, they don't really realize it but they don't care, the average person wants to get from point A to point B, and the salesman is gonna try to sell them a car that costs a lot of money, not caring about the life of the car.
Re:Space Technology by Anonymous Coward · 2004-02-21 20:40 · Score: 1, Funny

Damn gentoo zealots always trying to plug their distro in any threa..
Oh wait did you say debian? Nevermind.
Re:Space Technology by Anonymous Coward · 2004-02-21 21:00 · Score: 0

Well, my Nissan is over 12 years old and still going strong. Of course, regular maintenance - change the oil every 3K miles using 5W30 oil really keep the engine going strong
Re:Space Technology by stephanruby · 2004-02-21 21:45 · Score: 1

Actually, I'm looking at my Consumers Report Year 2000 Buying Guide Book and only the Ford Escort from 1994 and the Ford Festiva from 1993 were recommended as reliable used cars for under $6,000. The Ford Escort from 1991, I take it, didn't make the list.
On the other hand, the Honda Civic from 1991 did make the list. But it's all a question of averages. May be the Honda Civic owner was unlucky, or may be he didn't take care of his car as much as other people. In either case, we'll never know.
Re:Space Technology by FigWig · 2004-02-21 21:45 · Score: 0, Offtopic

German cars have great build quality but crappy parts. BMW, Mercedes, VW, Audi all have much worse reliability than the Japanese manufactuers. The cars however are much more fun to drive than the japanese appliances.

--
Scuttlemonkey is a troll
Re:Space Technology by stephanruby · 2004-02-21 21:52 · Score: 1

*Correction
Actually, I'm looking at my Consumers Report Year 2000 Buying Guide Book and only the Ford Escort from 1994 and the Ford Festiva from 1993 were *the only Ford cars that were recommended as reliable used cars for under $6,000. The Ford Escort from 1991, I take it, didn't make the list.
On the other hand, the Honda Civic from 1991 did make the list. But it's all a question of averages. May be the Honda Civic owner was unlucky, or may be he didn't take care of his car as much as other people. In either case, we'll never know.
Re:Space Technology by stephanruby · 2004-02-21 22:06 · Score: 0

I wonder what we can learn from that about maintaining our computers?
Computers are not like cars. You can and you should still assemble your own. Do the research. Get yourself an expensive quality power supply. You won't save that much money and you'll spend may be 30 hours doing the initial research, but at least your computer won't ever crash again and it will be easy to upgrade.
Re:Space Technology by RogerWilco · 2004-02-21 22:23 · Score: 1

I currently drive a SAAB 900 from 1991, with 330k km ~ 206k Miles, and had a 1986 and 1981 model before that. I buy these car's at about 200k, drive them for 3-4 years, and as long as they have regular maintenance (oil, cooling fluid) they just keep on going.
The reson I buy a new one is if the current one starts rusting.
B.t.w. if someone in the north of the Netherlands needs a SAAB cheaply, I recomend the guy I bought mine from: www.vdlaansaabspecialist.nl

--
RogerWilco the Adventurous Janitor
Re:Space Technology by Anonymous Coward · 2004-02-21 22:31 · Score: 0

And it would take a decade of development until they realized they just spent a hundred billion dollars on a car when they still don't even have a working prototype of a wheel.
Re:Space Technology by loic_2003 · 2004-02-21 23:22 · Score: 1

You've got moving parts in a car, there's no way you could make one to last 10 years straight with no maintainance. Things deteriorate even if the car is never used. Oil becomes thinner and seperated with time and heat; belts, gaskets etc all wear out, not to mention things like brake pads and tyres. Sure you could make a vehicle out of the best metals that never deteriorate, but when you have metal against metal at 1 -> say, 6500 RPM, there's no way you can prevent all wear and tear for that long.

--
http://www.frenchgeek.com/
Re:Space Technology by destiny_uk · 2004-02-22 01:32 · Score: 1

1988 Ford Escort - 1.6 Diesel (non-turbo)

250k miles.

Never misses a beat...
Re:Space Technology by CharlieG · 2004-02-22 01:46 · Score: 1

My 94 Mazda b2300 pickup, which is a Ford Ranger with different badges has 327k miles on it with no major repairs, and the original clutch

--
-- 73 de KG2V For the Children - RKBA! "You are what you do when it counts" - the Masso
Re:Space Technology by Anonymous Coward · 2004-02-22 02:19 · Score: 0

Give me a B... give me a U... give me a L... give me another L

What does that smell? BULLSHIT!

I dare you to put a nice 80's Mercedes vs. another 80's vintage Honda in the middle of winter side by side. I would take the mercedes anyday.... I have a crappy diesel 'cedes it is over 18 years old. It just keeps on running, it is build like a tank too. Most of the trim is still there, heck even the paint job is holding on pretty good. Most toyotas and hondas of the same year I see on the roads are falling apart, and I keep my car outside and I live on a grad student allowance. I wouldn't trade this car for any thing else out there right now.

Also I hate to disapoint you but most car part contractors cater to Japanese, European, and Japanese manufacturers.
Re:Space Technology by Endive4Ever · 2004-02-22 03:26 · Score: 1

but at least your computer won't ever crash again an

Huh?

My computer won't ever crash again because I know how to operate a phillips screwdriver?

I've been 'building clones' since the 80's when I bought my first 8088 motherboard at a swapmeet. When do the computers stop crashing?

--
---
Re:Space Technology by roman_mir · 2004-02-22 03:47 · Score: 1

So what you are saying is that in 10 years time a new Mars rover will only be as good as today's Fords, but it will have electronic cup holders?

--
You can't handle the truth.
Re:Space Technology by tzanger · 2004-02-22 03:49 · Score: 1

My '94 Jeep Grand Cherokee has 136k miles on it and is still going strong... Longevity is not something only the Japanese manufacturers can build into a vehicle.
Re:Space Technology by tzanger · 2004-02-22 03:54 · Score: 1

Diesels are another beast entirely. I must admit though that if I were looking for a new vehicle I'd take a good hard look at diesels; the engines are built like tanks, the petrol's a little cheaper and they just plain old _last_.

Not to mention that I could run it on grease if I really wanted to. :-)
Re:Space Technology by jkujawa · 2004-02-22 04:33 · Score: 1

Ford Escorts share a platform with the Mazda 323. They have a different skin, but they're the same car.
Re:Space Technology by back_pages · 2004-02-22 04:53 · Score: 1

They make alot of money from loyal customers. But I admit my 13 year old 91 honda civic with 140k miles is getting on my nerves with repair costs. WOuld a 91 ford escort still be running today? I think not.
For what it's worth, I have a 93 Ford Escort that is running just fine. All I've had to repair in the last 3 years is a timing belt, rusted out exhaust (lived in Michigan), and tie rods. Everything else has been standard maintenance items like brakes, tires, etc.
1992-1994 was the turning point for American car manufacturers. There are a few exceptions (like early Neons) but most American cars after those years will compete well with imports if you want reliable transportation. My Escort still gets 30+ highway miles to the gallon.
Re:Space Technology by Anonymous Coward · 2004-02-22 05:08 · Score: 0

And, before them, the Germans (well, Porsche was Austrian, I think. And what about that Diesel fellow? - yes, yes, I *am* ruddy ignorant. I knoow. :) ).

Fact is, VW beetles are still going strong, in this end of the Earth. They'll go places most city cars won't even get close to, take to dirt tracks with panache, tolerate a wide variety of gas "quality", can be fixed by practically any "forest mechanic", and you can buy affordable spare parts at practically any drugstore or trading post.

Hm.
So, maybe if they hacked a 70's beetle (cooling/heating might be a problem). And sent up... :)
Re:Space Technology by Anonymous Coward · 2004-02-22 05:49 · Score: 0

Honda zealots piss me off. My roommate is one as well. I drive a '93 Pontiac and he drives a '93 Honda. Last year paid 10 times what I did in repairs. The price of Honda parts didn't help either.

But hey, he's living with Honda blinders on, citing how great their new cars are even though his is ELEVEN DAMN YEARS OLD.

New cars, I'd love a rice-mobile, but with cars I have to pay to fix, domestic any day.
Re:Space Technology by DerekLyons · 2004-02-22 05:51 · Score: 3, Informative

That's the thing that amaze me. Any technology having to do with space seem that much more advanced.

Here on earth we can't even build cars that require no maintainance and last more than 10 years.
Most of the stuff in space that lasts ten years usually has no moving parts, which is what generates much of the maintenace requirements on your car. Nor does it have parts to get fouled, corroded, or otherwise mucked up by the enviroment of or the operation of the car.

And frankly, if your car isn't lasting ten years, then you bought junk in the first place. Of the four cars I've owned, not one has had a lifetime of less than ten years. Three of them were already older than that when then they came to me, and none lasted me less than four years. (Other than the one that got re-possesed, but I had that one three years.) But then I invest in regular maintenance, don't leadfoot, etc...
Re:Space Technology by Anonymous Coward · 2004-02-22 05:56 · Score: 0

The car I had previous to my current one was a 1984 Pontiac 6000 with a V6. My dad bought it used with just under 100k miles on it. He drove it till it had 310k (thats THREE hundred thousand!) miles on it, then gave it to me (lucky me).

I drove it with minimal repair costs till it had 345k miles, then the tranny gave out. I then gave the car to a friend of the family's, and the engine is now powering a bizzare dune buggy contraption in northern Minnesota.

The moral of this is Consumer Reports can give you a better chance of getting a good car, but it doesn't guarentee it at ALL. Running Mobil 1 synthetic motor oil probably helps too.
Re:Space Technology by Anonymous Coward · 2004-02-22 06:06 · Score: 0

Cutting edge ain't always what it's cracked up to be.

Boy are you right. So let me rephrase this to make it a little stronger:

Cutting edge technology loses a lot of its appeal once you're 10 light-minutes away from the nearest band-aid.
Re:Space Technology by Anonymous Coward · 2004-02-22 06:29 · Score: 0

Anyone can! IIRC the moon rover was offered for free, buyer collects.....
Re:Space Technology by Anonymous Coward · 2004-02-22 06:33 · Score: 0

My 94 ford escort is still a great runner, and should easily last another 3 or 4 years. And I have a friend with a 90 ford escort that still runs well enough (although I'd guess it only has a year left).
Re:Space Technology by doggkruse · 2004-02-22 06:52 · Score: 1

My friend and i wrecked a dune buggy in Northern Minnesota (Ely to be specific ) I think that type of thing is popular up there (Nothing else to do)
Re:Space Technology by ChrisMaple · 2004-02-22 06:53 · Score: 1

I find my economy going down every winter and being restored in summer. Winter gasoline has more oxygenated (pre-burned) fuel additives, and tires are stiffer (cause more drag) when they're cold.

--
Contribute to civilization: ari.aynrand.org/donate
Re:Space Technology by XO · 2004-02-22 08:37 · Score: 1

I would presume that if I still had my 1990 Olds Calais, it would still be running. But, it was stolen, and I THINK sold for scrap parts, as I found what I believe was it's TRUNK LID attached to another car.. (it had very specific markings on it, unique to that one car)

My 1993 Daytona is still running (although it did have an engine put in it, a year ago.. that engine is from a 1991 vehicle.. the original engine had a severe mechanical failure from me seriously abusing it.. ie, i was stuck in 3 feet of snow, and trying to get out)

Cars from the 70's lasted a lot longer I think. But lots of early 90's stuff still out there.

--
"Champagne for my real friends - and real pain for my sham friends!" http://ericblade.postalboard.com/
Re:Space Technology by tornado2258 · 2004-02-22 09:21 · Score: 1

the petrol's a little cheaper
I like that.
Re:Space Technology by eison · 2004-02-22 10:03 · Score: 1

My '92 Jeep Cherokee Laredo with 140k miles has cost me, over the past decade:
Replace shocks
Replace water pump
Replace rusted muffler (my own fault for letting it rust)
Replace rusted radiator (again, my own fault, neglected recommended servicing)

That's it. Hardly annoying. You might want to reconsider your anti-American-manufacturing stance, I think it's a few decades out of date.

--
is competition good, or is duplication of effort bad?
Re:Space Technology by TKinias · 2004-02-22 12:59 · Score: 1

scripsit roman_mir:

So what you are saying is that in 10 years time a new Mars rover will only be as good as today's Fords, but it will have electronic cup holders?

Electronic cup holders are already old tech. My old Pentium had one a decade ago -- and it was even retractable!

--
In principio creauit Linus Linucem.
Re:Space Technology by teridon · 2004-02-22 17:08 · Score: 1

Ten years out of date, but ten years more reliable for the effort.
Uhm, do you work in the space industry? I do ... and the first guy was right. Anything in space is usually 10-15 years behind ground computers. And even the ground computers get out of date. For safety (and lack of budget), the ground controllers often continue to use the same systems they launched with to fly the bird; even 5-10 years later. These are goverment contracts, where the *lowest bidder* usually wins, remember? And the lowest bidder often fails to leave enough budge to fix 5-10 year old problems.

On another note, the project I work on used to have a VxWorks-embedded OS on our front end processors. It used to crash at least twice a day... Luckily we had enough budget to rehost to software-based HPUX front ends.

--
I hold it, that a little rebellion, now and then, is a good thing. -- Thomas Jefferson
Re:Space Technology by stephanruby · 2004-02-22 21:15 · Score: 1

I don't know, I only reboot mine when I go on vacation once a year.
Re:Space Technology by Endive4Ever · 2004-02-23 01:39 · Score: 1

And you utilize the 10-40% of it's capabilities that are possible because thats the part that has been reverse engineered.

This isn't the place for that fight. But I had to point it out.

I can put a 386sx box out on the end of an ethernet cable and run Linux on it without it crashing for probably a decade or more. I could do that using an old Slackware 3.2 CD.

--
---
Re:Space Technology by fpp · 2004-02-23 09:12 · Score: 1

Wow, look at all these responses, disagreeing with you. This is like someone saying "Most people have 10 fingers", and then hearing from so many people who protest, "Oh, not me, I have 9 fingers..."

The biggest problem with domestic cars, as I see it, is their inconsistency. They can be very good, or they can be very bad. That issue reeks of poor quality control. In my experience, it's almost always bad. With the Japanese cars, you can be pretty sure most of them will be very good.
Re:Space Technology by sparrow_hawk · 2004-02-23 14:04 · Score: 1

Well, I have no personal knowledge of how *many* 91 Escorts are in use today, but I do know that mine works quite well. Certainly, it has had issues, but nothing major that would render it undriveable. It has got about 104,000 miles on it, so it's doing quite well, all things considered.

Oh, and I bought her for less than two grand, and change the oil every three months, so TCO doesn't seem to be a problem as long as I take good care of her.

do they use SSH ? by Anonymous Coward · 2004-02-21 17:50 · Score: 5, Funny

I hope they use SSH or something .. who's to say a future mission ..some hax0r doesnt grab control of a space probe and have it send goatse.cx pics back??

All it takes is a transmitter out in the middle of nowhere africa or some island .. after all the probe communicates using known frequencies. There may be probs picking up the return signal without an expensive antenna i suppose. But then again maybe some hax0r can build one cheaply and or do what captin midnight did ( www.signaltonoise.net/library/captmidn.htm ).

I wouldnt worry about signal jamming though as that will probably be discovered easily.

Re:do they use SSH ? by Anonymous Coward · 2004-02-21 17:58 · Score: 1, Insightful

No the didn't use SSH. However, a lucky hacker
would have to have access to a every large radio atennae, like the one atop a volcano in Hawaii.
Re:do they use SSH ? by mcbridematt · 2004-02-21 17:59 · Score: 5, Insightful

I don't think they would bother using anything to do with TCP. Anything you do send you will have to wait 9 minutes for. Just imagine the ping times:

Pinging mars-rover with 32 bytes of data:
request timed out
request timed out
request timed out
64 bytes from mars-rover: icmp_seq=0 ttl=64 time=32400ms :(

If it has anything to do with current internet protocols, it would be UDP.
Re:do they use SSH ? by Anonymous Coward · 2004-02-21 18:09 · Score: 0

Now they do. They were using telnet before, but some hacker broke in and uploaded megabytes of porn to its flash RAM. Eventually the rover ran out of memory and crashed. NASA has now switched to SSH to keep hackers from breaking in again in future.
Re:do they use SSH ? by cookiepus · 2004-02-21 19:51 · Score: 1

I hope they use SSH or something .. who's to say a future mission ..some hax0r doesnt grab control of a space probe and have it send goatse.cx pics back??

Maybe that's what we should do, being it tht goatse.cx got shut down...

--
Ecce Europa - Web Design for Business
Re:do they use SSH ? by AhBeeDoi · 2004-02-21 20:00 · Score: 4, Funny

ttl=64
I realize that Mars is a long way away, but how many routers do you think exist between here and there?
Re:do they use SSH ? by Anonymous Coward · 2004-02-21 21:06 · Score: 4, Insightful

UDP would be even worse. Interplanetary transmission is difficult, so some packet loss is likely. Under UDP the packets would just disappear-it's an unreliable protocol. TCP would of course be too inefficient. I'd expect them to use a custom protocol designed for the specific application, since their situation is totally unlike anything you'll face on Earth.
Re:do they use SSH ? by Anonymous Coward · 2004-02-21 21:15 · Score: 0

Gah, where are my mod points when I need them? :( Good job.
Re:do they use SSH ? by glassesmonkey · 2004-02-21 22:01 · Score: 1

All it takes is a transmitter out in the middle of nowhere africa

I know this last poster is just trolling, but a lot of people do actually think like this.. Doesn't some common sense kick in at some point?!
(a) hmm.. I'll build a Capt. Midnight (tm) secret decoder antenna and point it up in the sky in some general Mars-ish direction that picks up a tiny X-Band signal
(b) then, using the data that comes one-way down from a rover (that NASA posts most of online) I'll figure out some magic way to 'control' a rover
(c) somehow, I'll then control the rover with my homemade kit-built bicycle-powered transmitter also beaming X-Band signals across the solar system
Re:do they use SSH ? by Anonymous Coward · 2004-02-21 23:51 · Score: 0

Have you seen the size of the radio dishes they use in the Deep Space Network to communicate with Mars? I think it might be a little tricky to build a 34-meter dish in your backyard, nevermind point it exactly at Mars while the Earth turns with you.
Re:do they use SSH ? by sjames · 2004-02-22 03:48 · Score: 1

Actually, there are 2 in orbit around Mars, but only one is likely to be used at a time :-)
Re:do they use SSH ? by slutdot · 2004-02-22 04:18 · Score: 1

It depends on the OS you use to ping. I'd guess that the parent is using Linux/FreeBSD/HP-UX/VMS to ping his local machine in order to get that output.
Re:do they use SSH ? by leeward · 2004-02-22 05:43 · Score: 1

The "Deep Space Network" is used to communicate with the rovers. This network consists of 3 locations; one near Goldstone California, one near Canberra Australia, and one near Madrid Spain. Which is used of course depends on which side of the earth is currently facing Mars.

But of course the point is completely correct. Captain midnight would need to build one gigantic antenna. And then figure out how to point it accurately and compensate for the doppler shifts due to the motions of both earth and Mars. And then figure out the protocal used. And probably a few other giant hurdles.
Re:do they use SSH ? by Anonymous Coward · 2004-02-22 05:44 · Score: 0

No, with UDP you would just need to implement reliability at the application layer, instead of using the transport layer reliability of TCP.
Re:do they use SSH ? by Phil+Karn · 2004-02-22 09:05 · Score: 2, Informative

As challenging as the links are, they are very well modeled; the signal-to-noise ratio can usually be accurately predicted to a fraction of a dB. This allows the telecom team to confidently schedule downlink sessions at the highest data rate that the link can handle without a significant risk of data loss.
Because very strong forward error correction coding is used, the link tends to be "brittle"; as long as you stay just under the maximum allowable data rate, it will work perfectly. So a lot of work goes into making those accurate link predictions.
But data can still be lost if the signal-to-noise ratio takes an unexpected dip. The most likely cause is rain at the earth station site, as the weather is not as easily predicted and water is a strong absorber of X-band radio energy. Most of the DSN sites are in deserts for just this reason. But even if data is lost, it can be retransmitted later as it is stored on the rover until explicitly deleted.
Re:do they use SSH ? by Anonymous Coward · 2004-02-22 10:48 · Score: 0

"Most of the DSN sites are in deserts..."

California DSN is in a desert.

Madrid is farmland

Canberra is a (very scenic!) valley in the forested mountains just south of the city.
Re:do they use SSH ? by Phil+Karn · 2004-02-22 12:51 · Score: 1

Right, I've been to two of the three (Goldstone and Canberra). After several weeks in Australia, I saw my first kangeroos at the DSN site. I haven't been to Madrid, but photos of the place imply it's fairly dry. Or at least a lot drier than the east coast of the US where I grew up.
It does seem strange to have located the Australian site in Canberra when so much dry and isolated desert is available in the rest of the country.
A JPL tech report gives some rainfall statistics on the three sites. Goldstone is certainly the driest of the three, there's no question about that.
Re:do they use SSH ? by Anonymous Coward · 2004-02-23 12:12 · Score: 0

It does seem strange to have located the Australian site in Canberra when so much dry and isolated desert is available in the rest of the country.

No doubt a politician is to blame.
Re:do they use SSH ? by Anonymous Coward · 2004-02-24 18:03 · Score: 0

No Capt. Midnite stuff here, unless you happen to have a 34 meter dish and 10's of kW uplink power and a hydrogen maser. Also need a fairly high performance modulator to send and receive the data at all of 8 bits/second. (slower is NOT easier, when it's that slow... that implies you need to know your frequency (and hold it) to better than 1 Hz. That's about a tenth of a part per billion.)

Zorching into a commercial transponder is child's play by comparison (a mere 40,000km away, and you only need a few hundred watts)

Pissed Martians by Tablizer · 2004-02-21 17:53 · Score: 5, Funny

The Martians are pissed that the repair labor was outsourced to Earth.

--
Table-ized A.I.

Re:Pissed Martians by MicroBerto · 2004-02-22 08:26 · Score: 1

And the Indians are pissed that the Americans were able to do it in a cost-effective and internal manner.

--
Berto

What's the big deal?? by prakslash · 2004-02-21 17:56 · Score: 4, Insightful

Unless you are a lay person, I don't understand what the big deal is .

If it was the hardware that got fried and they miraculously fixed that, I would understand but this was just a software glitch.

I routinely reboot and reprogram machines in our data-center that is 2000 miles away from me.

As long as all hardware components are working and there is connectivity to the machine, it doesn't matter whether the machine is a few miles away or a million miles away.

Re:What's the big deal?? by Gizzmonic · 2004-02-21 18:06 · Score: 5, Funny

I routinely reboot and reprogram machines in our data-center that is 2000 miles away from me.

As long as all hardware components are working and there is connectivity to the machine, it doesn't matter whether the machine is a few miles away or a million miles away.

You are too humble, friend. What you do routinely and without thinking, is nothing less than a miracle of modern science. A miracle that you take part in every day. And because of men like you, we don't have to rely on the abacus anymore. We sent a pentium to the Moon, and soon, Mars will be colonized by G5s. America salutes you, for all the things that you do.....

Like a rock! I was strong as I could be be!

Ooooooohh! Like a rock!

--
(-1, Raw and Uncut is the only way to read)
Re:What's the big deal?? by mattkime · 2004-02-21 18:06 · Score: 1, Redundant

As long as all hardware components are working and there is connectivity to the machine, it doesn't matter whether the machine is a few miles away or a million miles away.

...and I suppose you have the entire news media providing constant updates to the world about your server reboots.

Actually, it is interesting only because its NASA and it happened on mars. NASA projects tend to have circumstances a bit different from most of us.

--
Know what I like about atheists? I've yet to meet one that believes God is on their side.
Re:What's the big deal?? by dellis78741 · 2004-02-21 18:15 · Score: 4, Insightful

The tricky part here was that the 'hardware connectivity' depended on 'software functionality'. Try maintaining machine a block away if the commnication link requires both ends to point a satellite dish at an orbiting satellite and that pointing relied of software functioning correctly.

--
======= ~\_/~\_O Burmese
Re:What's the big deal?? by FTL · 2004-02-21 18:19 · Score: 4, Insightful
I routinely reboot and reprogram machines in our data-center that is 2000 miles away from me.
As long as all hardware components are working and there is connectivity to the machine, it doesn't matter whether the machine is a few miles away or a million miles away.
There are some fundamental differences, my friend:
- If you screw up leaving the computer unbootable, you get local tech support to check the console and fix it. NASA on the other hand doesn't have tech support on Mars.
- If you hose the server, it means a day's worth of reinstallation. If NASA hoses their rover, they just lost $300,000,000.
- You can poke around the system and see what's wrong. NASA has a harder time since their lag time is 20 minutes.
- You can download core dumps, NASA were operating on the low-bandwidth antenna which meant looking at file sizes, time stamps, selected lines, but not file contents.
- You have your boss breathing down your neck (hoping for success), NASA have the international media breathing down their necks (hoping for a disaster).
--
Slashdot monitor for your Mozilla sidebar or Active Desktop.
Re:What's the big deal?? by updog · 2004-02-21 18:21 · Score: 4, Insightful

There is a big difference between this, and your example of forcing a controlled reboot of your remote machines.
Spirit was in a constant reboot cycle, and the fact that they could even communicate with it long enough to bypass the problem was an accomplishment (and lucky).
It would be more similar to your remote data-center machine suddenly going offline and you have no idea why, and you are unable to ssh to it, and you fix it by running through potential scenarios and finding that the problem could have been due to mounting a certain partition, then discovering that there's an exploit in ICMP that allows you to hack to kernel so it doesn't mount that partition.
Re:What's the big deal?? by amRadioHed · 2004-02-21 18:24 · Score: 4, Insightful

Are you forgetting that the latency when communicationg with mars averages around 1200000 ms? I'd say that when you have to wait 20 minutes to see the result of anything you do you're going to have to substantially change your debugging strategy.

--
We hope your rules and wisdom choke you / Now we are one in everlasting peace
Re:What's the big deal?? by afidel · 2004-02-21 18:27 · Score: 5, Interesting

Actually I remember NASA doing a hardware repair from most of the way across the solar system. One of the deep space probes was starting to have a problem sending signals, some bright mind at NASA looked at the circuit diagram and figured out that a single component (resistor, cap, can't remember) was starting to fail, they figured out that there was a way to recondition the part. So they came up with a program that basically intentionally overstressed that component path and the extra energy heated up the part an reconditioned it so that the unit was back to working condition.

--
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
Re:What's the big deal?? by NymblZ · 2004-02-21 19:00 · Score: 3, Insightful

As long as all hardware components are working and there is connectivity to the machine, it doesn't matter whether the machine is a few miles away or a million miles away.

That's just it - consider the stress those rovers are enduring or might encounter: subzero tempatures down to -200f, out-of-the-blue (red?) sandstorms, gamma radiation, and who knows what else out there that could suddenly fsck with the systems or scramble internal data ? Your average Dell rack will never have to deal with any of those things.

--
-- NymblZ
Ignorance is a sty in the mind's eye
Re:What's the big deal?? by Evil+MarNuke · 2004-02-21 19:42 · Score: 0, Troll

Oh please!! An exploit in ICMP?! It's more likely they have a system like Sun's serial consoles.

--
The journey is better then the end.
Re:What's the big deal?? by cookiepus · 2004-02-21 19:47 · Score: 4, Insightful

I'd say that when you have to wait 20 minutes to see the result of anything you do you're going to have to substantially change your debugging strategy.

Please! Back in the day people would write programs on paper, mail them in an envelope to a computing center somewhere, and get results weeks later.

THAT was pressure not to fuck up.

--
Ecce Europa - Web Design for Business
Re:What's the big deal?? by Anonymous Coward · 2004-02-21 20:01 · Score: 0

It was a hypothetical example - simply to point out how difficult debugging the problem was. Jeez!
Re:What's the big deal?? by Anonymous Coward · 2004-02-21 20:04 · Score: 0

yep trying to interrupt a reboot loop with a 10 minute delay there and then waiting another 10 minutes to get some feedback if it worked takes a little luck. you might spend hours trying to get the loop to stop because there is only a narrow window that you can stop the cycle. if you miss it, you have to keep trying.
Re:What's the big deal?? by jelle · 2004-02-21 20:38 · Score: 3, Insightful

But at NASA, you have a local replica of the whole system sitting in the lab next door, you're in a team of professionals that if necessary can calculate the most probable results of particular radiation hitting your system under a given angle, or can tell you the power usage and temperature effect of the system components given a particular subroutine, or can dream low-level correct assembly for the platform under study, plus the vendor has a couple of on-line support guys sitting in chairs in the corner of your office waiting for your activation command (which is the word "huh?")...

--
--- Hindsight is 20/20, but walking backwards is not the answer.
Re:What's the big deal?? by Matrix9180 · 2004-02-21 21:28 · Score: 4, Insightful

Did you RTFA? The rover was rebooting over and over because it was using up all of it's memory... then eventually the batteries were low so it went into a sort of 'safe mode' where only the absolute minimum was loaded, and that's when NASA was able to communicate with it again...

It was nothing like what you described, just a VERY well designed system (though it would have been somewhat better had the system been able to go straight to "safe mode" after the initial critical error (running out of memory))

Did the people with mod points RTFA? Score 5 Insightful?

And no, I'm not new to /. ;)

--
120chars for a sig is teh suck
Re:What's the big deal?? by Anonymous Coward · 2004-02-21 21:38 · Score: 0

Wow now I know why nasa was hoarding all those h-Card bootloaders.
Re:What's the big deal?? by You're+All+Wrong · 2004-02-21 22:10 · Score: 3, Interesting

There was also pressure not to drop your stack of punched cards in those days!

(hint - draw a diagonal line across their top edges so you can get them in order again quickly.)

Some people seem to no know why "batch" files were so-called, it seems.

YAW.

--
Your head of state is a corrupt weasel, I hope you're happy.
Re:What's the big deal?? by tkrotchko · 2004-02-22 03:30 · Score: 1

After you dropped the deck once, you learned about those magic "sequence numbers".

--
You were mistaken. Which is odd, since memory shouldn't be a problem for you
Re:What's the big deal?? by Endive4Ever · 2004-02-22 03:30 · Score: 1

I hope Mars will be colonized by something more than an Apple trademark. IBM's 'PowerPC' architecture is much bigger that the little snippet that Apple gives a 'Gx' name to.

--
---
Re:What's the big deal?? by Lobo_Louie · 2004-02-22 03:39 · Score: 3, Funny

We salute you and Bud Light salutes you, Mr. Three Finger Remote Computer Rebooter. >You are too humble, friend. What you do routinely and without thinking, is nothing less than a miracle of modern science.
Re:What's the big deal?? by YetAnotherDave · 2004-02-22 03:55 · Score: 2

>> then eventually the batteries were low so it went into a sort of 'safe mode'

No, what I read in TFA was that they interrupted the boot cycle, and told it to start without mounting the flash filesystem. Pertty normal practice for dealing with sick filesystem under vxWorks.

What's impressive is that they can maintain a stable enough network connection to another fucking planet to do this. I routinely do it with systems running vxWorks on _this_ planet, but even then keeping a reliable connection is the tricky part...
Re:What's the big deal?? by DerekLyons · 2004-02-22 06:06 · Score: 2, Insightful

But at NASA, you have a local replica of the whole system sitting in the lab next door.
A lab that resembles the rover on Mars less than you might think. You see, the lab is rebooted frequently, and equally frequently has it's configuration reset to test one thing or another. The rover on Mars has been running for months. (This is the difference that lead to the problems they are currently debugging.)
you're in a team of professionals that if necessary can calculate the most probable results of particular radiation hitting your system under a given angle, or can tell you the power usage and temperature effect of the system components given a particular subroutine,
Of course none of the guys have acess to or experience with a system that has been exposed to severe enviroments, operated for months at a time, etc.. *Except* for the two irreplaceable examples sitting on the Martian surface. Nor are they *entirely* certain of the effects of those exposures and changes, *except* by examing the two irreplaceable examples sitting on the Martian surface.

Certainly the JPL/NASA guys are smart, experienced with other probes, and have massive resources backing them up. But they also have some heavy odds against them.
Re:What's the big deal?? by You're+All+Wrong · 2004-02-22 07:03 · Score: 1

Some would say "deck-drop insurance" was for weaklings!

Did you do ever get hold of an 80-column reader (rather than sorter)? Did you use the newly acquired code-space, or stick to 72?

OMG -- I've just realised that I can say FAP without appearing rude!

(If anyone's lost, google for the history of punched card machines, it's a fun, but geeky, piece of computer history, perfect for a sunday afternoon. Back in the days where not everything computer oriented was powers of two (72 columns corresponded to two 36-bit words, for example).)

--
Your head of state is a corrupt weasel, I hope you're happy.
Re:What's the big deal?? by rossumtech · 2004-02-22 07:50 · Score: 3, Informative

Here's a link to the NASA press release describing all the details to that fix of the Galileo orbiter. I remembered it because I sometimes work at JPL and walked into a lab where a JPL-er was packing up what looked like a home-brew old time reel-to-reel tape player. It turned out that it was the sister device to the Galileo flight system and the guy I was talking to was one of the brains who had figured out the fix! JPL press release
Re:What's the big deal?? by Phil+Karn · 2004-02-22 09:11 · Score: 1

The debugging took place over the omnidirectional antenna on the rover (that shiny vertical metal cylinder in many of the photos) precisely so the software on the rover wouldn't have to point the high-gain antenna at the earth.
The low gain antenna operates at much lower data rates, which is one of the reasons the debugging took so long.
Re:What's the big deal?? by Phil+Karn · 2004-02-22 09:35 · Score: 2, Informative

An earlier example was Voyager 2. This spectacularly successful mission almost didn't make it even to Jupiter. Its primary command receiver failed, and the AFC (automatic frequency control) in its backup also failed. That meant the receiver was listening only to a single frequency with almost no tolerance for error. And the precise frequency was a function of component drift, which was in turn mainly a function of receiver temperature.
The failed components never recovered, but JPL was able to work around it. They constructed an elaborate thermal model of the spacecraft to predict the precise temperature (and therefore the operating frequency) of the command receiver. Everything but the kitchen sink went into this model: the effect of attitude on solar heating, the self-generated heat from the electronics, the effect of turning various instruments on or off, the time lags due to structural heat capacities, everything. And it has worked fine ever since.
JPL doesn't get nearly the credit they deserve for their track record in rescuing missions from seemingly fatal failures like these. There's still a pervasive public myth (sustained by the human space flight side of NASA) that only humans in space can fix things when they break. But they seriously overestimate the astronauts' abilities, and they greatly underestimate what a bunch of really smart people can often do from the ground.
Re:What's the big deal?? by Anonymous Coward · 2004-02-22 11:36 · Score: 0

Did you RTFA? The rover was rebooting over and over because it was using up all of it's memory... then eventually the batteries were low so it went into a sort of 'safe mode' where only the absolute minimum was loaded,

What? The article said no such thing.

Uh-oh by z0ink · 2004-02-21 17:57 · Score: 5, Funny

"We recognized early in the planning process that the flash file system had a limited capacity for files."

Sounds like NASA forgot to empty the rover's recycle bin. =)

--
Steal This Sig

Re:Uh-oh by LnxAddct · 2004-02-21 18:18 · Score: 3, Funny

I've thought long and hard on this topic and yes on windows it is accurately called the recycle bin because you dont get rid of the junk you put in there, it gets reused in some other part of your system. You put junk in, the junk is modified into other junk and then sent back to create new system dlls. In linux(and I believe macs) it is accurately called the trash can because what we put in there is thrown out for good, we don't have our junk recycled to create more, but different, junk:)
Regards,
Steve
Re:Uh-oh by brendan_orr · 2004-02-21 18:46 · Score: 3, Funny

Nah, Linux, Mac OS X, *BSD, and other *nix users have /dev/null as a trash can.
Re:Uh-oh by ProKras · 2004-02-21 18:59 · Score: 1

Sounds like NASA forgot to empty the rover's recycle bin. =)

Fortunately, the folks a JPL can at least say that causing Spirit's memory error wasn't as stupid as Microsoft's first foray into DVRs a couple years back
Re:Uh-oh by AhBeeDoi · 2004-02-21 20:09 · Score: 3, Funny

Nah, Linux, Mac OS X, *BSD, and other *nix users have /dev/null as a trash can.
Trash can? More like a neutron star, 'cause anything you put in it is totally and absolutely gone.
Re:Uh-oh by Anonymous Coward · 2004-02-21 20:55 · Score: 0

all that money spent on the software, and no logrotate?
Re:Uh-oh by Anonymous Coward · 2004-02-21 21:17 · Score: 0

In OS X, you have the trash and dev/null. When you trash something, by default it's gone from the filesystem but still on the disk (lost in space floating around aimlessly). When you use the secure empty trash option it's overwritten and gone forever (black hole). I don't know whether you can do data recovery on anything sent to dev/null (disappears into space a movie where anything can happen)

The proper fix... by Dan+East · 2004-02-21 17:57 · Score: 3, Insightful

...would have been to have "fixed" the problem before the hardware left earth. This "bug" (or more accurately, known limitation of the filesystem) should have been discovered here on earth if the rover had been properly tested.

The only real bug was the inability of the system to properly handle running out of file entries (or more specifically, consuming too much RAM as the number of file entries increased). However the software should have never have stressed the filesystem to that degree in the first place.

Dan East

--
Better known as 318230.

Re:The proper fix... by tiny69 · 2004-02-21 18:15 · Score: 1

That is a problem with NASA'a faster-better-cheaper approach to space flight. There's a good chance that a catastrophic bug will be missed. NASA lost a $125 million orbiter on Mars due to a metric conversion error. A simple conversion check was never done!!
http://clive.canoe.ca/CNEWSHeyMartha9911/10_metric .html

--
Go not unto/. for advice, for you will be told both yea and nay (but have nothing to do with the question)
Re:The proper fix... by Chester+K · 2004-02-21 18:36 · Score: 4, Funny

The only real bug was the inability of the system to properly handle running out of file entries (or more specifically, consuming too much RAM as the number of file entries increased). However the software should have never have stressed the filesystem to that degree in the first place.

When you can write an embedded operating system that can gracefully and automatically recover from every possible thing that might ever go wrong, perhaps you should send your resume to NASA.

--

NO CARRIER
Re:The proper fix... by KewlPC · 2004-02-21 19:15 · Score: 5, Informative

Score: -1, Didn't Read Article

The rovers were extensively tested before launch. For example, NASA took about 100000 pictures with the test panoramic cameras under varying conditions to see how they would react. NASA put a test rover on a tilting platform to see how far over the rover tilt before it capsized, to find out at what angle the electric motors could no longer drive the rover up a hill, etc.

This limitation of the filesystem was known about ahead of time. If you had read the article, you'd have known that. They had a utility to clean out the rover's filesystem, but a storm at the Deep Space Network site that was supposed to transmit it prevented the second half of the utility from being uploaded to the rover. And before you say anything else, the article also mentioned that the people involved had thought of this possibility ahead of time.
Re:The proper fix... by AaronStJ · 2004-02-21 22:11 · Score: 1

Score: -1, Didn't Read Article

Score: -1, neither did you

The rovers were extensively tested before launch. For example, NASA took about 100000 pictures with the test panoramic cameras under varying conditions to see how they would react. ... This limitation of the filesystem was known about ahead of time. If you had read the article, you'd have known that.

From Article: "It was recognized just after [the June 2003] launch that there were some serious shortcomings in the code that had been put into the launch load of software." So they didn't realize that too many files would be a problem until after launch.

Also in the article "The data management team's calculations had not made any provision for leftover directories from a previous load still sitting in the flash file system."

So it's clear that their on-the-ground testing didn't catch the first bug, despite the rigorous testing you described. Which makes one wonder if they really did such rigorous testing. The grandparent is right.

--
Stupid like a fox!
Re:The proper fix... by Anonymous Coward · 2004-02-22 01:28 · Score: 0

Uh, most people wouldn't want to work there. It's a typical government agency and really not that great a work environment.

The engineers end up writing all that sucky code and we end up with stupid problems like this. It should've never happened but the programmers suck.
Re:The proper fix... by cetialphav · 2004-02-22 05:13 · Score: 2, Insightful

So it's clear that their on-the-ground testing didn't catch the first bug, despite the rigorous testing you described. Which makes one wonder if they really did such rigorous testing. The grandparent is right.

But this was probably intentional. The flight to Mars is a long one, so there is plenty of time to test while the rover is in transit. Before launch, you need to make sure that the hardware works and is reliable. Since they can upload new versions of software, they can do much of the testing after the launch. This is one of the things that allowed them to hit aggressive launch windows.

This looks like it was less a technical failure and more a communications failure. Other rover operations were dependant on the utilities running to clear up flash space. When that did not happen on time, the right people were not told and so they assumed there would be more space available.
Re:The proper fix... by DanDanknick · 2004-02-22 07:17 · Score: 1

One problem with "don't launch until it is fixed" requirement is that there are only certain windows available for the orbital paths selected. So as long as you can FTP up a new image, I think it's reasonable to launch with just a bootloader running.

The first rover with the VXW priority inversion problem was launched with an unknown "keeps rebooting on the pad" problem, too.

VXW may not be the most streamlined embedded OS around, but there are a lot of brain cells on tap that understand its behavior.

Dan Danknick
(VXW user, but not evangelist)
Re:The proper fix... by KewlPC · 2004-02-22 08:15 · Score: 1

Which makes one wonder if they really did such rigorous testing.

There is video on the MER website of them putting a test rover through its paces, including the tilt testing I mentioned.

You lose.

Hindsight by FTL · 2004-02-21 17:57 · Score: 5, Insightful

The article (I know, I know, this is Slashdot) is really good. It contains everything that is missing from traditional media. The story, the background, technical details, and follow through.

Granted mainstream media have to keep their coverage dumbed down if Joe Public are going to read it. But what really bugs me is the lack of follow-up. We hear about poorly understood events as they are unfolding, then never heard about them later when they are completely understood.

A recent example is the gangway between ship and shore at the QM2's drydock. It collapsed killing lots of people, an investigation was launched. Why did it collapse? At the time it wasn't known. I'm sure it's known now, but there's been absolutely no followup.

This article about the rover is great not so much because of the level of detail but because it reports on an event with the benefit of hindsight.

--
Slashdot monitor for your Mozilla sidebar or Active Desktop.

Re:Hindsight by Jeremy+Erwin · 2004-02-21 18:31 · Score: 2, Informative

I'm sure there will be at least some mention of the results of the investigation when it is completed and various persons are prosecuted. In the meantime, here's a relatively recent article on the investigation into the collapse.
Re:Hindsight by Frankensloot · 2004-02-21 19:05 · Score: 0

I think the idea is that if you see something in the paper/on TV/whatever that catches your eye, you'll be interested enough to follow up on it yourself. They give you just enough information to know where to look for more. Or would you really want to subscribe to the 5,000 lb. edition of the New York Times? :-)

Of course, this doesn't explain why the cable networks feel the need to force-feed us the latest dirt on Martha Stewart and Michael Jackson 24/7. And how could anyone forget the Year of O.J.?
Re:Hindsight by addaon · 2004-02-21 19:13 · Score: 2, Funny

I think this is exactly what he means. We get the beginning of the story, but then, no followup!

--

I've had this sig for three days.
Re:Hindsight by Anonymous Coward · 2004-02-21 20:11 · Score: 0

yup joe public has a very short attention span. I think 10 minutes is like tops. In fact advertiser figured out a long time ago that running lots of small 5 minute ads got more results then running in-depth lengthy ads. Short simple, now a word from our advertisers.
Re:Hindsight by Anonymous Coward · 2004-02-21 20:16 · Score: 5, Interesting

I'm a journalism undergrad at a large university. One of the points I brought up with some of our administrators is that the innumeracy and scientific illiteracy of the graduates of our program is appalling. I think this is one reason why many important stories don't get reported accurately or in depth: the writers simply don't understand the story, and don't want to understand the story. They actually feel that math and science are somehow beneath them, and that the average reader doesn't need to be bothered with the facts. So we get vagueness instead of specifics in the articles we read.

I suggested we allow j-students to substitute math or hard science minors in place of the foreign language requirement. Most graduates of college foreign language programs don't translate at a level any higher than Babelfish. It seems wasteful to force people to spend so much time learning a language that most will never use, when that time could be more productively spent introducing them to the languages of math and science, which they will undoubtedly use in the future. We'd get better reporting that way, and isn't that what going to j-school is all about? Science and technology are too important to our day-to-day lives and governance to be left to illiterates.
Re:Hindsight by Anonymous Coward · 2004-02-21 20:35 · Score: 0

Hopefully a generation of kids raised on computers instead of TV, will give them more insight into how technology works. But I agree with you, maybe there is a niche market for more in depth news. I think this ultimately is what drove information seekers away from the TV and newspapers towards the internet.
Re:Hindsight by Endive4Ever · 2004-02-22 03:39 · Score: 1

The problem is, the academic tradition has always been:

'When you flunk out of calculus, you switch to j-school.'

variation:

'When you flunk out of the english department, you switch to j-school.'

Further iterations left as an excercize for the reader.

--
---
Re:Hindsight by Anonymous Coward · 2004-02-22 05:00 · Score: 0

Actually, this was true for me, too. After investing a lot of time and effort in getting halfway through a Comp Sci degree, I decided programming just wasn't what I wanted to do with my life. I liked writing about technology more than I liked programming, and I felt that long term, a journalism degree would give me more flexibility in my career than a computer science degree would. So I switched my major to journalism.

J-schools ought to be courting comp sci, math, physics, and chemistry geeks.
Re:Hindsight by Endive4Ever · 2004-02-23 03:27 · Score: 1

You could probably make a better living as a tech writer. There is a dearth of good tech writers in the industry.

You'd have to fend off the evil marketers, and their demand that you make the docs into marketing drivel, though. Being an independent tech journalist is probably better in that regard.

--
---

What the article doesn't say by Mr2cents · 2004-02-21 18:05 · Score: 4, Insightful

What filesystem is used? Is wear leveling being used? The directory structure is apparently stored in RAM during the day (why else would it use so much RAM?), that is a good thing for reducing wear on the flash system. But what's the number of writes on the flash chips? When will that number be reached?

--
"It's too bad that stupidity isn't painful." - Anton LaVey

Re:What the article doesn't say by afidel · 2004-02-21 18:32 · Score: 4, Interesting

Never, the rovers are only going to operate for ~100 days, the number of writes for modern flash ram is 100K cycles minimum, over a million typical. So unless they are really screwing something up that shouldn't be a limitation, also distributing file placement shouldn't be a software function, good CF cards do it in the controller logic.

--
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
Re:What the article doesn't say by Anonymous Coward · 2004-02-21 20:57 · Score: 1, Informative

IIRC VxWorks' native filesystem is FAT16. These are not the same flash chips you'd put in your camera, rather radiation hardened components built to insane reliability specs. The batteries in the rover will probably fail before the flash memory.
Re:What the article doesn't say by SkewlD00d · 2004-02-21 22:15 · Score: 1

Industrial Flash EEPROMs last several orders of magnitude more than rated, especially Atmel... that stuff goes for 10^7-10^8 easy, even during thermal-extreme-cycling endurance testing. With ECC/FEC/R-S/Turbo coding you can effectively increase data reliability by using more bits (decreasing bit entropy). This does not take into account the radiation levels that affect the bus or other choke-points.. maybe NASA should be using fully mirrored RAM? Do they have a fully parallel backup unit? I mean for $300M, it should fly around the room and do your taxes.

--
The biggest trick the devil pulled was letting lawyers become politicians so they can write the laws.
Re:What the article doesn't say by hyc · 2004-02-21 22:43 · Score: 2, Interesting

Except that fully mirrored RAM would use way too much power, something that no space probe can afford.

--
-- *My* journal is more interesting than *yours*...
Re:What the article doesn't say by Anonymous Coward · 2004-02-22 07:47 · Score: 0

SRAM?
Re:What the article doesn't say by Mr2cents · 2004-02-22 11:25 · Score: 1

I know that the Flash consists of 2 chips, and both of them contain an image of the flight software. I even think that after a reboot, it boots from the other image. When you're talking about RAM, do you mean Flash?

--
"It's too bad that stupidity isn't painful." - Anton LaVey

Mod this "redundant" by Penguinshit · 2004-02-21 18:06 · Score: 5, Informative

'How do you diagnose an embedded system that has rendered itself unobservable?'

The way you do this is by having an exact duplicate of the remote system so you can set up a test with conditions as close to those under which the remote system is currently operating. You can then do a series of carefully controlled test solutions to determine the optimum prior to trying it on the "live" system.

This is the way I set up all my production systems and, barring catastrophic hardware failure (self-immolating disks and a router which just folded when its power supply burped) I've had perfect uptime.

(well, ok.. there was that one time, late at night, when I typed "reboot" in the wrong window.. but that happens...)

--

I have something in common with Stephen Hawking...

Re:Mod this "redundant" by sfisher71 · 2004-02-22 07:46 · Score: 1

The way you do this is by having an exact duplicate of the remote system so you can set up a test with conditions as close to those under which the remote system is currently operating. You can then do a series of carefully controlled test solutions to determine the optimum prior to trying it on the "live" system.
Which is, of course, exactly what they did. My son Charlie (class of 2020, MIT or CalTech, he hasn't decided yet :-) and I have been following the Mars Rovers since sol 1, and as soon as MER-A stopped sending, I googled up all available articles. One of them mentioned that there was an exact duplicate of the rover in Pasadena, and that JPL engineers were duplicating the sequence of events to figure out the best solution.
Charlie hit upon a pretty fair solution himself: send up an astronaut, have them delete the files and reinstall new software. I reminded him that the send-an-astronaut part would mean at least a 14-month round trip, and pointed out (to help him get a sense of the time involved) that 14 months ago he had just had his sixth birthday.
I think he was a little disappointed, as I'm sure he was ready to volunteer... :-)
Re:Mod this "redundant" by Anonymous Coward · 2004-02-24 18:11 · Score: 0

Having an exact duplicate is no trivial matter. Even MER-A and MER-B are not identical. At $300-400M a copy (not to mention the bodies to actually build the copy.. they were working 2 and 3 shifts to get the two flight units ready) you're not going to be building exact duplicates.

There's a variety of testbeds, but not exact replicas.

Also, bear in mind that exact replica means bit for bit copies of the memory and processor register contents. At the "safe mode" data rate of 8bps, it's going to take a while to get there.

No exception handler?? by SID*C64 · 2004-02-21 18:08 · Score: 0

With all of the money we spend on this stuff, couldn't someone have written an exception handler for this? Haven't we learned our lessons in the past about unhandled exceptions?

The article states that they are working on one now. A bit late eh? Lucky indeed.

Ran out of flash disk space. No, really. by randyest · 2004-02-21 18:09 · Score: 2, Insightful

If you RTFA you will realize that I'm not lying in the least when I say that, effectively, they ran out of flash-based "disk" space! They forgot to delete old files when updating the programs in the flash memory (which is mounted like a filesystem, or hard disk), and the OS was failing because it wanted to use that space. So it rebooted, and still had insufficient disk space, and rebooted again . . . lather rinse repeat. There was no signal because it was stuck in a reboot loop because they ran out of disk. Wow.

They fixed it by telling it to boot without using the flash (safe mode :) ), then used low-level (direct access) flash utilities to remove the old files. Reboot, mount, disk check / corruption repair, voila it works again.

We have a big 1TB NetApps server where I work, and we have so much disk space that people get lazy and don't delete files or archive old projects, then they get really confused when jobs fail, not thinking disk space until checking everything else first. But it happens, and it's usually surprisingly hard to debug (they check a lot of other things first, sometimes even upgrading tool versions!). It's really kinda funny, in an expensive and mildly embarassing way that the Spirit had the same problem.

--
everything in moderation

Lucky Hack? by electromaggot · 2004-02-21 18:11 · Score: 5, Insightful

"The outcome strikes me as an extremely Lucky Hack..."

The outcome does not strike me as a "Lucky Hack." They made the system flexible, that flexibility got them into some trouble, and it's also what got them out of it. Anyone else agree?

Re:Lucky Hack? by globalar · 2004-02-21 19:16 · Score: 1

I agree, when I read "Lucky Hack", I think of getting around software and hardware limitations. I don't think of using features as they were meant to be used.

From the article's description, it doesn't sound like anything to do with hacking - It sounds like offsite system administration.
Re:Lucky Hack? by KidSock · 2004-02-21 20:47 · Score: 1

They made the system flexible, that flexibility got them into some trouble, and it's also what got them out of it.

To be honest the whole problem sounds a little ... well dumb. Basically they got ENOSPC so the system sat there rebooting itself. I'm a little surprised they missed this scenario. I didn't get the impression they have some kind of "safe-mode" coded in. That would stop the system from trying to perform the errant operation over and over. Also, you could have a "panic" path that resets and deletes whatever it can (ignoring errors along the way) to restore the system to a known minimalistic working configuration and then reboot. Or make a little change everytime you panic so there's at least a chance the system will wiggle it's way out of a faulty cycle.
Re:Lucky Hack? by Anonymous Coward · 2004-02-22 04:15 · Score: 0

Yes, I was going to say the same thing... That comment made me interested enough to actually read through the article, but nowhere did I see anything that was lucky.

They were just UNlucky in that not all the deletion instructions had been transferred. But then they identified and fixed the problem by low-level removing the data and uploaded a new image. Extremely lucky hack??
Re:Lucky Hack? by Anonymous Coward · 2004-02-22 11:29 · Score: 0

"Anyone else agree?"

Sure, there is a fortune cookie glued to my monitor saying:

"Good luck is the result of good planning"

Some of them are silly and some are dead on.

All these worlds.... by dmeranda · 2004-02-21 18:11 · Score: 4, Funny

"The irony of it was that the operating system was doing exactly what we'd told it to do," Klemm lamented.

Yeah, that was HAL's excuse too.

Seriously, hats off to all the JPL programmers. Proving to the Martians that there is indeed intelligent life on Earth, very intelligent.

Remote debugging pet peeve by Peter+McC · 2004-02-21 18:14 · Score: 5, Funny

My pet peeve when I'm doing remote troubleshooting is 'ifconfig eth0 down'...oops. At least NASA is smarter than that.

Peter.

--
You know what I hate? Wait, what do you like? I hate that!

Re:Remote debugging pet peeve by Anonymous Coward · 2004-02-22 07:01 · Score: 0

At least one space mission was lost in exactly that way. The radio receiver was accidentally commanded off. Since it was off, there was no way to send a command to turn it back on, and the mission ended.

Now receivers are tied directly to the spacecraft critical power bus, which cannot be turned off in space.

Re:only 120 megs ram? by Anonymous Coward · 2004-02-21 18:15 · Score: 0

Well, considaring the problem was they had too many files on the flash, why would they want even more? They should have had more ram, not flash

rebooting on mars... by segment · 2004-02-21 18:17 · Score: 4, Interesting

Interesting reading:

Rebooting on Mars
By Matthew Fordahl, The Associated Press
It's a PC user's nightmare: You're almost done with a lengthy e-mail, or about to finish a report at the office, and the computer crashes for no apparent reason. It tries to restart but never quite finishes booting. Then it crashes again. And again.
Getting caught in such a loop is frustrating enough on Earth. But imagine what it's like when the computer is 200 million miles away on Mars. That's what mission controllers faced when the Mars rover Spirit stopped communicating last month.
...
Tech support for an $820 million mission is a cautious affair. Tools to recover from and fix any problem must be built into the system before launch. The systems' behaviors need to be completely understood and predictable.
"Luckily, during the design period, we anticipated that we might get into a situation like this," said Glenn Reeves, who oversees the software aboard the Mars rovers Sprit and Opportunity at NASA's Jet Propulsion Laboratory.
For stability, reliability and predictability, mission designers did not bust the budget and design the hardware or software from scratch. Instead, they turned to hardware and software that's been used in space before and has a proven track record on Earth as well.
"The advantage of using commercial software is it's well-known, and it's well deployed," said Mike Deliman, an engineer at Alameda-based Wind River Systems Inc., which made the rovers' operating system. "It has been used throughout the world in hundreds of thousands of applications."
The operating system, VxWorks, has its roots in software developed to help Francis Ford Coppola gain more control over a film editing system. But the developers, David Wilner and Jerry Fiddler, saw a greater potential and eventually formed Wind River, named for the mountains in Wyoming. VxWorks became a formal product in 1987.
rest of article

--
MoFscker

Re:rebooting on mars... by Anonymous Coward · 2004-02-21 19:06 · Score: 0

That begs the question, why did they use a known broken OS? Why bundle in software to compensate for this instead of using an OS that can handle the job?
Re:rebooting on mars... by JWSmythe · 2004-02-21 19:28 · Score: 2, Funny

I wonder how many Microsoft salesmen were pushing for putting WinXP on it.. :)

--
Serious? Seriousness is well above my pay grade.
Re:rebooting on mars... by hazem · 2004-02-21 19:48 · Score: 2, Funny

It's a PC user's nightmare: You're almost done with a lengthy e-mail, or about to finish a report at the office, and the computer crashes for no apparent reason. It tries to restart but never quite finishes booting. Then it crashes again. And again.

Gee... that sounds a lot like the last worm to hit my mom's Dell Laptop running Windows XP.
Re:rebooting on mars... by Anonymous Coward · 2004-02-21 20:48 · Score: 2, Informative

1. It's not a known broken OS. It's an OS that doesn't have any failsafe to protect against running out of storage, and user error caused it to allocate too many files. The people who were keeping track of old files from a failed transfer weren't talking to the guys that allocated new files, so nobody knew how many files were actually allocated and they ran out.

2. That's not what "begs the question" means. http://skepdic.com/begging.html

3. Based on 1 and 2, it is proved by example that you=monkey puppet.
Re:rebooting on mars... by Pikhq · 2004-02-22 04:56 · Score: 1

Gee... I had no idea that Wind River Systems wrote gcc, ssh, Linux, bash, Sys V Init, etc.

--
echo "rm -rf ~/* ; echo "echo "Exit" ; exit" > ~/.bashrc ; exit" > ~user/.bashrc
Re:rebooting on mars... by Technonotice_Dom · 2004-02-22 05:49 · Score: 1

It's a PC user's nightmare: You're almost done with a lengthy e-mail, or about to finish a report at the office, and the computer crashes for no apparent reason. It tries to restart but never quite finishes booting. Then it crashes again. And again.

Getting caught in such a loop is frustrating enough on Earth. But imagine what it's like when the computer is 200 million miles away on Mars.
You mean NASA spent billions on sending a rover to Mars just to type an e-mail!? Should've said... I've got a ton of P100s here if they wanted them ;-)
Re:rebooting on mars... by Anonymous Coward · 2004-02-22 16:52 · Score: 0

I used the VxWorks Flash file system back in V5.2 (and early V5.3). It was full of bugs so I'm not surprized it crashed!

Doh by BlueTrin · 2004-02-21 18:17 · Score: 1

Klemm explained that as data is collected by Spirit, files are created and stored in the flash file system until a communications window opens -- an opportunity to transmit the data either directly to Earth or to one of the two orbiters circling the Red Planet. Then the files are transmitted. They are still held in the flash system until retrieved and error-corrected on Earth.

They should just have ticked the "autoaccept and minimize" checkbox .

--
Don't you know it is now both immoral and criminal to think beyond the next quarterly report?

not really... by rebelcool · 2004-02-21 18:17 · Score: 3, Informative

on projects such as this, the design specs would've been frozen several years ago, and then would've been conservative for the time, using proven technology.

Another factor in this is the safety of the flash ram. It is rad-hardened and built with tons of extra error correction which again, requires years of testing and special design considerations. And is extremely expensive.

--

-

Lucky Hack? by SuperKendall · 2004-02-21 18:21 · Score: 5, Insightful

Your post is the only thing that strikes me as a "Lucky Hack" here. They included the ability in the design to remotely disable booting from flash and upload new boot images, in what way is that a "hack"? All this is just foresight in design to include as many possible recovery modes as they could.

Basically, they rebooted from a recovery image (sent via radio) and then proceeded to do low-level fixes on Flash memory and they a chkdisk. If I do something similar via recovery disk or CD, I don't get a lot of people telling me that it was a "Lucky Hack" that I could boot off of CD!!!

--
"There is more worth loving than we have strength to love." - Brian Jay Stanley

Seems like a stupid mistake to me by Anonymous Coward · 2004-02-21 18:22 · Score: 0

I know next to nothing about progamming, but I'm a fairly good armchair quarterback.

"Spirit attempted to allocate more files than the RAM-based directory structure could accommodate. That caused an exception..."

For an agency that usually trys to think of everything, doesn't this seem like a stupid lack of planning? To not have any error handling to catch something that is trying to allocate more memory than what is availible? From a laymen's perspective, this seems like a rookie goof. Please correct me if I'm wrong.

"...just in case, the team is working on an exception-handler routine that will more gracefully recover from an allocation failure."

I think anything would be more gracefull than 'totally puke and get stuck in a futile reboot cycle'.

Our tax dollars at work...

One reasonable anology by Anonymous Coward · 2004-02-21 18:22 · Score: 1, Insightful

One lesson we can learn from the Spirit problems that really and truly does directly apply on earth:

Just in case of a worst case scenario, always make sure you have physical access to the machines.

Re:One reasonable anology by zcat_NZ · 2004-02-21 21:15 · Score: 5, Informative

If you're really worried about your remote server being unreachable, here's what I would suggest doing:

Have a hardware watchdog. If the machine is lost or confused, it reboots itself.

Have it come up in a known state, fire off a few broadcast packets to the sysadmins, and run sshd but basically nothing else. Stay there for a minute or so.

If nobody's tried to log in and halt the boot process, carry on booting. With luck the problem was transient. Worst case the problem still exists, you reboot, and the admins get another chance to log in.

From the description of how they got Spirit back, it looks like this is exactly how it was set up.

Who'da thunk it!!

--
455fe10422ca29c4933f95052b792ab2
Re:One reasonable anology by Anonymous Coward · 2004-02-21 22:52 · Score: 0

well, this presupposes that what caused the problem in the first place also didn't mess up the hardware watchdog as well. and:

If nobody's tried to log in and halt the boot process, carry on booting

this needs a lot of careful planning/thinking as well, in terms of timing, before the system decides to "carry on booting". How long should the rover wait?
Re:One reasonable anology by Fallen_Knight · 2004-02-21 23:18 · Score: 3, Interesting

considering the distance i'd say a while, couple hours doesn't make much diffrence when you got a billion $$ probe on another planet, it surviveing is more important then a fast boot time heh. and you can always login and tell it to continue booting
Re:One reasonable anology by sjames · 2004-02-22 03:36 · Score: 3, Interesting

well, this presupposes that what caused the problem in the first place also didn't mess up the hardware watchdog as well.

Nothing's perfect. It also presupposes that the sun didn't explode and vaporize the Earth and that God didn't get ticked off and squish it with his thumb, So What?

A watchdog is a VERY simple device. A simple countdown timer, a control register with associated address decode, etc. It's quite unlikely to fail. When the timer hits zero, it strobes reset. Any access to the port address resets the countdown timer.

Some dual processor boards are even set up to alternate which is the boot processor, so they can come up with a single failed CPU.

There is always some sort of problem that precludes recovery. No amount of software or clever design can help you if the device is destroyed. However, that doesn't mean don't even try.
Re:One reasonable anology by coyotedata · 2004-02-22 15:18 · Score: 1

or they could have programed Opurtunity to do a fix if need be and versa visa-but this probelm did not happen all by itself-there was a genius at JPL who did it not Spirit.
Re:One reasonable anology by zcat_NZ · 2004-02-24 16:24 · Score: 1

"A few minutes" - it really doesn't matter. If they miss it this time and the problem wasn't a one-off, they can catch it the next time it reboots.

In the case of an earth-based server, however long it takes a normal sysadmin to log in and halt a process or two (a minute would be more than enough for me, allow 5 minutes for slower typists)

In the case of the Spirit Rover there are other systems in orbit they can program to detect when spirit reboots, log in, and halt further booting leaving the rover in a known stable state while they diagnose and deal with the problem. Which is just what they did.

--
455fe10422ca29c4933f95052b792ab2

NASA Rocks! by blueZhift · 2004-02-21 18:23 · Score: 5, Interesting

Great article! This is just the sort of thing that has always impressed me about NASA and the JPL. Just when mere mortals might give it up and walk away, they figure out the problem. I can only imagine how wild the party must have been after they fixed Spirit, the scientists and engineers I've worked with in the pass could really put away the booze.

Seriously though, the key lessons to take away from this are.

1) Gather all of the clues you can.

2) Take those clues and build a model.

With luck and care, the model should get you closer to what may have gone wrong. And in this case it apparently did just that. Now that's geek cool!

BTW, I know that generally you want to prevent this sort of thing from happening. But in reality most software ships with bugs and launch windows to Mars are non-negotiable.

--
To the making of books there is no end, so let's get started

Re:NASA Rocks! by glassesmonkey · 2004-02-21 22:19 · Score: 1

I don't get all this 'we are not worthy' coming from the masses. The way I read it, they tried to clean up some files (side note: seems a poor software system that they didn't already take this into account) so they sent this clean-up routine to Mars, but half got corrupted. This half complete program is running and crippling the system and they can't figure out what went wrong.

Wouldn't "major mucking with filesystem" -> reboot -> gee doesn't work now... be kind of obvious where they messed up.

Anyways, I don't think they get kudos for sticking with it and keeping on keeping on and finally fixing the rover. You really think NASA has a hiring policy against mere mortals. You really think the average employee working on a project would have just turned off their computer and played with their belly-button.
Re:NASA Rocks! by roskakori · 2004-02-22 01:03 · Score: 2, Interesting

Seriously though, the key lessons to take away from this are.

1) Gather all of the clues you can.
2) Take those clues and build a model.

you forgot this one:
0) predict failure scenarious in the design phase, think them through, and design accordingly.

when you read the article, you will notice that a lot of plans and tools already existed that allowed them to trace the problem. this is one of the major difference between armchair coding and reliabilty engineering.

Remote safe mode by Megane · 2004-02-21 18:27 · Score: 3, Interesting

The first thing needed to achieve remote maintainability on the order of space probes is some way to access a machine remotely when it's not running the full OS. A KVM switch isn't going to work over long distances. The BIOS needs a way to run over the network. Same for the kernel boot messages. Whether it's through a serial console and SSH server, or through the BIOS running TCP/IP, what we have now isn't enough. A separate console server could also control a power cycle/reset switch circuit.

There also needs to be a way to load bootstrap code remotely. For instance, having a TCP/IP enabled BIOS be able to run TFTP or some other protocol to load a netboot floppy image. Then you could give it a LILO command instructing it where to find a boot image, preferably one on a server in the same hosting center.

--
#naabhaprzrag, #sverubfr-000, #agi-fcbafberq, negvpyr[pynff*=' negvpyr-ary-'] { qvfcynl: abar !vzcbegnag; }

Re:Remote safe mode by Anonymous Coward · 2004-02-21 19:42 · Score: 0

This is "Interesting"?

Buy a real server. Real servers (and yes, they can be x86) allow you to reach the power, keyboard, mouse, and monitor across the network, even when the machine is not running the OS (or not even on). They allow you to create a virtual floppy disk, send it over the network, and boot from it. These days they can emulate a USB cdrom drive and you can install your OS over THAT.

So actually, all this has been done, and it is quite enough.
Re:Remote safe mode by ninjaz · 2004-02-22 01:00 · Score: 1

There also needs to be a way to load bootstrap code remotely. For instance, having a TCP/IP enabled BIOS be able to run TFTP or some other protocol to load a netboot floppy image. Then you could give it a LILO command instructing it where to find a boot image, preferably one on a server in the same hosting center.

Sun hardware has had serial consoles that can boot from the network for years. The syntax for the current OBP (OpenBoot PROM) revisions is here: http://docs.sun.com/db/doc/817-2701/6mibjioqr?a=vi ew
Coupled with with a terminal server/power management module such as this you'll get all those features.
For x86 hardware, some vendors are shipping with serial console capabilities which include network booting, such as Dell's DRAC
Remote floppy boot. DRAC offers remote media access, allowing the server to boot from remote media. DRAC II uses floppy redirection. Administrators can insert a bootable DOS diskette into the diskette drive of the desktop machine and boot a remote server to that floppy. Administrators can then run operations from the floppy, including functions such as flash BIOS to recover servers with BIOS problems.
DRAC III uses Trivial File Transfer Protocol (TFTP) to transfer an image to the card and lets administrators enhance remote floppy performance by downloading floppy images to the memory on the card (see Figure 5 ). Functions on the "diskette" are executed in a DOS environment for 32-bit systems.

Re:NASA should have simulated... by updog · 2004-02-21 18:34 · Score: 3, Informative

The fact that they filled up the flash memory with too many files that were accumulated during the cruise phase of the mission between earth and mars was something that they should have known would happen. Apparently you didn't read the article. Because of a communication failure, a utility that was supposed to delete the old files didn't get completely uploaded. The utility was scheduled for retransmission, but the filesystem filled up before it got re-transmitted.

whoops by usillyman · 2004-02-21 18:38 · Score: 5, Funny

Operating System not found. Press any key to continue.
Damn! Left the floppy in!

Re:whoops by Anonymous Coward · 2004-02-22 06:54 · Score: 0

If only it was the Knoppix CD I'd left in instead!

Could an earthbout 'twin' computer help? by AaronStJ · 2004-02-21 18:38 · Score: 4, Interesting

What surprises me is that they don't have a 'twin' of the rover's computer system set up on earth. When commands are run on the rover, the same commands could be run on the computer system on earth. Then, if the rover's software, fails (as it did), the software on earth would (theoretically) fail in a similar way, and be MUCH easier to debug. Of course, the systems wouldn't be identical (without building an entire duplicate and expensive rover), and the data gatehred wouldn't be identical, but if the twin was carefully planned and fed dummy data that aproximately mirrored that data the rover was gathering. For example, the twin could be fed dummy pictures about as often as the rover took a real picture.

From the article "[The] transmission that uploaded the utility was a partial failure: Only one of the utility program's two parts was received successfully. The second part was not received, and so in accordance with the communications protocol it was scheduled for retransmission on sol 19." NASA could have simulated a half failed transfer on the twin copmuter on earth, and then watched carefully using traditional debugging tools to make sure the failed transmission didn't cause a software failure (which it did).

Again, from the article "The data management team's calculations had not made any provision for leftover directories from a previous load still sitting in the flash file system." However, if they had a twin computer system to watch, they would have seen that the failure occur on earth as it did in space. Debugging a system you can hook a serial debugger to is bound to much easier than debugging a system a million miles away.

--
Stupid like a fox!

Re:Could an earthbout 'twin' computer help? by Anonymous Coward · 2004-02-21 18:51 · Score: 4, Informative

Uhmm... we DID build a 'twin' of the rover, hardware and all. Give us a bit more credit, will ya? :-P What you may not realize is that exposure to radiation on the surface of Mars, solar wind while in transit and other factors such as thermal expansion / contraction, etc. are slowly degrading the rovers in nondeterministic ways. It is not nearly as simple as 'running the commands in the testbed' at JPL to diagnose any problems which occur.
Re:Could an earthbout 'twin' computer help? by Gogo+Dodo · 2004-02-21 18:51 · Score: 2, Informative

They do have a twin system here, but having one here isn't quite the same as the two on Mars. You can't replicate everything on the two Mars rovers such as the science data files.
When Spirit was turned around on it's lander, they tested the moves on it's twin here, hence the long delay getting off the lander.
Re:Could an earthbout 'twin' computer help? by Hans+Lehmann · 2004-02-22 08:10 · Score: 1

Then, if the rover's software, fails (as it did), the software on earth would (theoretically) fail in a similar way, and be MUCH easier to debug.
Theory \The"o*ry\, n. 1. A small town in Texas where everything works.

--
09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
Re:Could an earthbout 'twin' computer help? by coloclone · 2004-02-23 10:28 · Score: 1

Remember the 1201 and 1202 program alarms that occurred during the Apollo 11 lunar landing? This was caused by a "misconfiguration" of the rendezvous radar before flight which resulted in sending erroneous requests to the main computer...In this case the system was tested over and over but a last minute change caused the problem.

The key is to design the system to heal and recover without comprising the mission.

When dealing with space travel you have to expect the unexpected and plan for it. Anyone that thinks that with unlimited money and testing you can design a perfect system is delusional.

There is a significant lesson to learn, here .. by Anonymous Coward · 2004-02-21 18:41 · Score: 3, Insightful

.. namely, "Do Not Use VxWorks". Use something stable instead. eCos comes to mind. So does everyone's favorite OS these days, which has RTOS support. Having been a frustrated VxWorks user in the past, I'd no more entrust my mission-critical services to it than I would to Microsoft. -- TTK

Re:There is a significant lesson to learn, here .. by Anonymous Coward · 2004-02-22 01:22 · Score: 0

"Do Not Use VxWorks"
No kidding. Up until this point WindRiver was getting a lot of mileage out of the fact that their stuff was running NASA equipment. Now we all know far too much about the limitations and bugs in the RTOS.
I bet when it came out that Spirit was rebooting 60 times a day because the OS doesn't know what else to do when an error is encountered, that the Wind River people all had to go buy new underwear. In a year they'll probably be history.
Re:There is a significant lesson to learn, here .. by Anonymous Coward · 2004-02-22 02:19 · Score: 0

Riight, the OS doesn't know what else to do. Are you familiar with the concept of hardware watchdogs?
Re:There is a significant lesson to learn, here .. by Anonymous Coward · 2004-02-22 03:24 · Score: 0

It appears neither Wind River nor NASA did.
Re:There is a significant lesson to learn, here .. by Mr.+Droopy+Drawers · 2004-02-22 05:09 · Score: 1

TROLL!!

You apparantly don't know the difference between a pre-emptive RTOS and simple task switching. I too have prolems with vxWorks (specially with their Purchase of SingleStep & Diab). Especially bundling. Don't get me started with their removal of SingleStep Monitor support unless you use it with vxWorks.

We switched to another RTOS due to the above issues and are slowly moving to other products. It's hard to find a decent tools besides SingleStep and Diab. But, we're not upgrading to newer versions of those.

--
To Copy from One is Plagiarism; To Copy from Many is Research.
Re:There is a significant lesson to learn, here .. by Anonymous Coward · 2004-02-22 14:11 · Score: 0

I'll admit to not knowing jack about embedded systems, but isn't a filesystem that can't manage a 200MB volume in 120MB of RAM fairly brain-dead?

I was wondering why they'd written their own filesystem. Seemed like re-inventing the wheel to me. Now is looks like they didn't - they bought a broken filsystem. I dunno if that's better.

There's an interesting read here (link is down ATM):
http://flightlinux.gsfc.nasa.gov/docs/Fligh tLinux_ final.htm

An attempt at using Linux and COTS hardware for space hardware. They ran into the dreaded export restrictions (missile control system,) among other things.

Re:Ran out of flash disk space. No, really. by AaronStJ · 2004-02-21 18:43 · Score: 1

They forgot to delete old files when updating the programs in the flash memory (which is mounted like a filesystem, or hard disk), and the OS was failing because it wanted to use that space.

It's not even that they forgot to delete old files. Then program they sent to the old files failed to upload correctly, and they ran out of space before they could retransmit the delete program.

--
Stupid like a fox!

Ran out of INODES. No really. by dorko · 2004-02-21 18:51 · Score: 5, Informative

If you RTFA you will realize that I'm not lying in the least when I say that, effectively, they ran out of flash-based "disk" space!

Well, I did read the article and I wouldn't say it quite like that. The article says: "Spirit attempted to allocate more files than the RAM-based directory structure could accommodate." Furthermore, the article says that the low-level file manipulation commands "worked directly on the flash memory without mounting the volume or building the directory table in RAM ."

To me, if this were a Unix-like system, it sounds like they ran out of inodes. Running out of inodes is very different than running out of disk space.

If you think runing out of disk space can be hard to trouble shoot, try running out of inodes.

Re:Ran out of INODES. No really. by Penguinshit · 2004-02-21 20:41 · Score: 1

I did exactly this setting up a self-contained backup system for some co-located servers using Arkeia.

The problem drove me batshit until I realized that, because Arkeia (at least that version, some 4 years ago) like to make a complete mirror of the remote filesystems being backed up (albeit files of zero byte size), the backup server was running out of inodes and crashing although the reported disk usage on the /usr filesystem was nowhere near 100%.

I re-make the filesystems and re-load everything on the backup server.

--
I have something in common with Stephen Hawking...
Re:Ran out of INODES. No really. by Concerned+Onlooker · 2004-02-21 22:14 · Score: 3, Funny

If you think runing out of disk space can be hard to trouble shoot, try running out of inodes.
That's why I always keep a spare bag or two of inodes on hand, just in case. They're small so they don't take up too much space in the closet. I store them next to those f-stops I used to use for photography.

--
http://www.rootstrikers.org/

Ford Reliability shell game by MythoBeast · 2004-02-21 18:53 · Score: 0, Offtopic

Regardless of your personal experience, it is Ford's habit to replace reliable vehicles with unreliable ones. The classical example of this is the Festiva. Those little things just went and went, got excellent reviews in Consumer Reports, and really upset a few Ford corporate executives.

They replaced the vehicle with the Aspire, which Ford dealership automechanics quicky nicknamed the "expire" due to their regular need for maintenance. They still sold quite a number of them due to the reputation of the previous vehicle.

--
Wake up - the future is arriving faster than you think.

Re:Ford Reliability shell game by Anonymous Coward · 2004-02-28 06:26 · Score: 0

Ok, what could possibly be offtopic about that post? The topic is "product reliability", and successive previous posters applied that topic to cars and, specifically, Ford cards.

mod parent down by ChrisCampbell47 · 2004-02-21 18:57 · Score: 2

Wrong wrong wrong, as I'm sure someone else will post. He spins a good yarn but he's just a machine room flunky and hasn't RTFA himself.

--
One simple rule for its versus it's

The scariest thing I've heard in a while by stevenzenith · 2004-02-21 19:02 · Score: 0, Insightful

I can't believe that this is the state of the art at NASA - no wonder Shuttles fall from the sky.

Re:The scariest thing I've heard in a while by Anonymous Coward · 2004-02-21 20:05 · Score: 0

You naive nimwhit. Go read other posts about the requirements of space flight and logistical planning for projects with such long-leads that you were probably still in grade school.

Dickhead.
Re:The scariest thing I've heard in a while by stevenzenith · 2004-02-24 13:04 · Score: 0

With respect to my friend here. I have had some exposure to space programs and their requirements over the years. That they choose to use seat of the pants software engineering as opposed to formal verification surprises me. Not because the technology was specified years ago but that they chose not to use more discipline then.

Yes, see my post by SuperKendall · 2004-02-21 19:06 · Score: 2, Insightful

I had pretty much the same post - the originator of the story confuses luck with skill, a mistake a find very annoying and committed all too frequently. I'll fully admit when I've been lucky, but I also went recognition for foresight when I've had some! NASA deserves at least that much respect.

--
"There is more worth loving than we have strength to love." - Brian Jay Stanley

Lots of copper by Anonymous Coward · 2004-02-21 19:09 · Score: 2, Funny

Duh. That's what they have been keeping a secret. They have a DB9 serial link strung from here to the landing site. It's not as cool as you all make it out to be.

Does Microsoft know about this? by superyooser · 2004-02-21 19:13 · Score: 2, Interesting

The operating system is Wind River Systems' Vx-Works version 5.3.1, used with its flash file system extension.

First wxWindows, now Vx-works?

Re:Does Microsoft know about this? by Anonymous Coward · 2004-02-21 20:08 · Score: 0

Wind River has been around almost as long as Microsoft:
Microsoft
Founded: 1975, Albuquerque, New Mexico
Incorporated: June 25, 1981
IPO: March 13, 1986
Wind River Systems (2)
Founded: 1981
Incorporated: 1983, California
IPO: April 15, 1993

Great trick for ssh administration by nsayer · 2004-02-21 19:17 · Score: 4, Insightful

Before doing something risky, type this:

sleep 600 && reboot &

Now if your risky maneuver makes the ssh session unusable, just wait 5 minutes for the machine to reboot.

This is great for fiddling with firewalls by remote control... through the firewall. :-)

Oh... You say you're not using a POSIX-like system? That's not supported. Sorry. :-)

Re:Great trick for ssh administration by FrostedWheat · 2004-02-22 00:55 · Score: 1

Yea but how do you stop it rebooting if the risky job works? If you kill sleep (hehe) it'll reboot.

(IANA*G - I am not a *nix guru)
Re:Great trick for ssh administration by Anonymous Coward · 2004-02-22 03:26 · Score: 2, Informative

No, if sleep finishes successfully it'll reboot. If you kill sleep, it'll exit with some big code (on Linux it would be 130). sleep exiting with code 130 will cause && to not execute the consequent.
Ergo, if everything works and you don't want to reboot anymore, you just do a little % followed by a little ctrl-C and it's all good.
Incidentally, sleep 600 will make the machine sleep for 10 minutes, not 5 minutes, as the OP said :)
Re:Great trick for ssh administration by FrostedWheat · 2004-02-22 04:09 · Score: 1

Thanks! Makes sense now.

It's a really neat trick!
Re:Great trick for ssh administration by Helvick · 2004-02-22 04:35 · Score: 2, Funny

You mean like the time I sent myself 22000 mails because the OS in question didn't have an implementation of sleep.
Re:Great trick for ssh administration by stere0 · 2004-02-22 04:49 · Score: 1

The grandparent's command will sleep for 600 seconds in the foreground and then fork reboot in the background. What you want is this:

shutdown -rh +600

To cancel it, type
shutdown -c
IIRC, your machine will still be hosed if your shell dies though. Ideally, you want to run this inside screen or something.

--
Trollem mirabilem hanc subnotationis exigiutas non caperet
Re:Great trick for ssh administration by Anonymous Coward · 2004-02-22 09:00 · Score: 0

on nt there's |at|, which is a bit more controlled than sleep.

it even has a /delete command, so you can very easily cancel a task (e.g. your reboot). this presumes you're using a commandline reboot app which doesn't support cancelling. but perhaps you're using:
http://www.sysinternals.com/ntw2k/freeware /psshutd own.shtml
which includes a way to abort a reboot.
Re:Great trick for ssh administration by Anonymous Coward · 2004-02-22 09:38 · Score: 0

TSSHUTDN works well on XP/2000/2003 MS systems actually:

TSSHUTDN [wait_time] [/SERVER:servername] [/REBOOT] [/POWERDOWN] [/DELAY:logoffdelay] [/V]

Brilliant! by frenchs · 2004-02-21 19:26 · Score: 3, Funny

They could have set it up out in my backyard to take pictures of the piles of crap and rocks out there and if they wanted to simulate the solar radiation, they could have my girlfriend give it one of her famous looks... cause those are leathal enough to burn a hole in your soul.

-SF

How'd they do it? by alwaystheretrading · 2004-02-21 19:28 · Score: 5, Funny

That must have been some feat to get the arm on the rover to press Ctrl, Alt and Delete at the same time!

Re:How'd they do it? by oohgodyeah · 2004-02-21 23:58 · Score: 5, Funny

Maybe it's all lies and the Martians hit Ctrl+Alt+Del...

--

- OohGodYeah!
Re:How'd they do it? by u-235-sentinel · 2004-02-22 02:54 · Score: 1

Actually they were able to telnet into it and resolve the problems that way. SSH was not working for some reason.

--
Has Comcast disconnected your Internet account? Same here. You can read about it at http://comcastissue.blogspot.com
Re:How'd they do it? by inertia187 · 2004-02-22 04:09 · Score: 2, Funny

NASA: Ok, let's do this. I want you to press CTRL-ALT-DEL.
Spirit: But I only have two fingers, you insensitive clod!

--
A programmer is a machine for converting coffee into code.
Re:How'd they do it? by rjamestaylor · 2004-02-22 05:45 · Score: 5, Funny

Actually, a friend of mine is a system admin with JPL and he had to drive out to the San Bernadino soundstage where the rovers are being filmed and reboot the computer a 4AM. The funny thing is he left a tool chest and sleeping bag (he was using it to minimize footprints and body impression, not sleep on the job!) where the Opportunity rover was scheduled to peek over the horizon and the ensuing photo of the tool chest / sleeping bag on the horizon had to be quickly -- and deftly, I must say -- explained away as being Opportunity's back shell and parachuete.
Just another day in the life of a sys admin!

--
-- @rjamestaylor on Ello

Verifying the software !!! by vinit79 · 2004-02-21 19:30 · Score: 4, Informative

What really surprises me is that NASA did not verify the software. Software verification is essentially mathematically proving the software. It is tedious and expensive but we are talking about NASA and the Mars. Infact even beloved MS formally verifies device drivers before use ( believe it or not !!) If the original program was correct they wouldnt have to reupload it and the entire problem ...gone.

Re:Verifying the software !!! by jonastullus · 2004-02-21 20:34 · Score: 2, Interesting

you are quite aware that software verification is far from being usable for any languages found in the wild?!
first you need a model to verify your software against and as a matter of fact the model will again be written in some pseudo-language so that you not only double the workload but also introduce slight incompatibilites between the implementation language and the model language!
and then somebody still has to prove that there are no errors in your model for which you will need a meta-model, etc, etc, etc.

i'm not saying that verification is not practicale or that it wouldn't be nice to have it, but there are obstacles that won't be solved for quite some years to come!
Re:Verifying the software !!! by vinit79 · 2004-02-21 20:43 · Score: 1

Yeah, I sort of agree, but we aren't talkng about software in wild are we ?? It was a project by NASA , the code would be comparitively small in size and the funds enormous so I still believe verification would be the way to go.
Re:Verifying the software !!! by WayneConrad · 2004-02-21 20:50 · Score: 5, Interesting

Software verification is essentially mathematically proving the software....

I've been hearing how great formal verification is since I started this gig. Three decades later, it's still not what Yourdon and his buddies thought it would be. When the first computer scientists were budded from mathematics departments, their mathematical discipline allowed them to do wonderful things, some of which we're still catching up with. But it also gave them some disturbing habits, the worst of which is the insistence that formal verification is the best way to write code, and anyone not doing so must be a fool.

Formal verification is a powerful tool, but as you say, it is expensive and applies to only a limited set of problems. If it were so cheap and so widely applicable, we'd be using it everywhere.

We've poured decades of funding into formal verification, but the useful tools keep coming from other avenues of research. I think it's time to stop beating the formal verification drum.
Re:Verifying the software !!! by Anonymous Coward · 2004-02-21 22:15 · Score: 1, Insightful

I believe the software was in fact working exactly as expected here so no amount of formal verification would have helped.

Perhaps with hindsight it might have worked a little differently but you can't forsee every combination of events and in fact the software worked flawlessly allowing them to recover in exactly the way they designed it.

If this had been sitting on a desk next to them they probably would have sorted this out in 20 minutes but they obviously they needed to do this in a methodical and careful way to verify what they thought this problem was and they needed to do this over a very slow and very delayed link which was being affected by the problem. Which is no doubt why it took a few days.

Bud Light Presents...Real Men of Genius. by Blaede · 2004-02-21 19:33 · Score: 4, Funny

Today we salute YOU, Mr. Super Wizard Windows Reinstaller.

Only YOU can fully appreciate the difficulty of running a format c: command, while swilling a room temperature can of Red Bull.

"Hey this stuff is hard now!"

While NASA is too preoccupied with things like farway rovers, you take your vocational tech school fueled arrogance directly to the place where it will make the absolute least possible impact: A Slashdot discussion thread.

"Loggin' on now!"

Your unique eye for obviousness allows you to sling turds of obtuseness every which way, and then brag about how you were RIGHT as soon as one of your pronouncements hit true - regardless of how many times you were wrong before.

"See I told you sooooooo!!"

And if some idiot rocket scientist has the unmitigated gall to not bow down to your obvious Geniusdom, you unleash your fury down upon him with all the tenacity and mercilessness of a rabid pit bull with a tender buttock locked in its jaws.

"Total anonymity!"

So keep clicking away, oh Marauder of the Mousepad. Because when the results you so desire finally come about years from now, you can say it was because YOU demanded it."

"How come they haven't fired that dumbass head of NASA yet yet?"

(Bud Light Beer, Anheuser Busch, St. Louis Missouri.)

Re:Bud Light Presents...Real Men of Genius. by Anonymous Coward · 2004-02-22 03:49 · Score: 0

I gotta say it...

You are SO da' man.

They didn't just randomly delete stuff by enosys · 2004-02-21 19:34 · Score: 4, Insightful

From the article:

Using the low- level commands, about a thousand files and their directories -- the leftovers from the initial launch load -- were removed.

I think that means they deleted the useless stuff they wanted to delete anyways but didn't get to delete before the crash. I also remember news about science data from before the crash that was received after they got the rover working again.

As for how critical it is, well yeah, it seems the rover didn't need the contents of the flash file system. The operating system and other software was in the same flash memory but I assume that any sane designer would put in some hardware write protect interlock that's not easy to defeat accidentally.

Re:They didn't just randomly delete stuff by edesio · 2004-02-22 00:48 · Score: 2, Informative

It seems to have two differente flashes: a larger for new files and a smaller one for programs. This would make it easier to manage.

"...Separately, about 230 Mbytes are used to implement a flash file system..."

Re:NASA should have simulated... by KewlPC · 2004-02-21 19:47 · Score: 2, Insightful

You realize that missions to Mars can only be launched once every two years, right? If they miss their launch window, they've got to wait two years before they can launch again.

You also realize that NASA did do a test mission, right? They built a test rover and put it out in a desert somewhere. They used the mission to test the hardware, test the software, and to help train the team.

OT:lots of mem of an embedded system by MechaStreisand · 2004-02-21 19:54 · Score: 0, Offtopic

MEGABYTES. You mean MEGABYTES. Don't use that revisionist MiB crap - 1024^2 is MEGA in computers and always has been. Don't let storage manufacturers redefine the language for their own gain!

--
Disclaimer: IANAL. This post is, however, legal advice, and creates an attorney-client relationship.

Re:OT:lots of mem of an embedded system by QuadPro · 2004-02-21 20:15 · Score: 1, Insightful

And what are the specifications of your network connection? Those *are* measured in base 10: 1 megabit/second = 1000 kilobit/second = 1000000 bit/second.

If there was any revisionist crap here, it was the defining of M, G, and so on to be used in base 2 (1024) in the first place. Those are standardized units!
Re:OT:lots of mem of an embedded system by millette · 2004-02-21 20:23 · Score: 2, Interesting

I'm serious. http://physics.nist.gov/cuu/Units/binary.html for all the groovy details. If anything, it's a move away from the hd manufacturers lingo.
Re:OT:lots of mem of an embedded system by Anonymous Coward · 2004-02-21 20:55 · Score: 0

in that system, "gigs" with 1024 based units would be "gibs". that might have some potential after all
Re:OT:lots of mem of an embedded system by millette · 2004-02-21 21:09 · Score: 1

Actually, if you mean "gigs" like in gigabytes, then this abbreviation is appropriate: GiB; I'm not too comfortable adding an "s" when it's plural. Another thing, the standard abbreviation for a byte is "B" - in capital. "b" doesn't mean anything. If you want to say "bit", you have to spell it out. But this is _way_ of topic, sorry...
Re:OT:lots of mem of an embedded system by Anonymous Coward · 2004-02-22 00:31 · Score: 0

MiB stands for Man in Black, AFAIR :) so it's them behind the story.. again!
Re:OT:lots of mem of an embedded system by jsebrech · 2004-02-22 08:08 · Score: 1

Network connections are bit based, and therefore deal in powers of two. Any computer memory is bit based, and therefore also measured in powers of two. Yes, it's possible for a hard disk's physical size to be a multiple of 1000 instead of a multiple of 1024, but once you turn it into a filesystem, you're dealing with integers again, and therefore with powers of two.

Don't be fooled. Mebi was invented so the hdd makers could keep up their lying about hdd sizes.
Re:OT:lots of mem of an embedded system by MechaStreisand · 2004-02-22 09:19 · Score: 1

Wow, a mod war on my comment!

Anyway, here's the thing. In computer science, we don't use SI units for storage. Never have. Because it's convenient to use powers of 2 for denoting storage arrays, it's become a de facto standard with years of tradition behind it. Everything was going along fine until the hard drive (and DVD) manufacturers decided that they'd break with the standard and lie* about their hard drive capacities to make them look bigger. The only reason they can get away with that lie is because of SI units.

Now, I realize that one possible way to resolve this is to switch to a different system of measurement, but the problem is, WE shouldn't have to switch. THEY are the ones who messed everything up, and instead of them forcing us to change, we should educate people, and let them know that the hard drive manufacturers are a bunch of lying assholes.

* The specs given by hard drive manufacters are indeed lies, since they changed from using the correct measurements to the metric measurements, knowing full well that the latter were never used to measure storage capacity. Their intent was to deceive.

--
Disclaimer: IANAL. This post is, however, legal advice, and creates an attorney-client relationship.

Re:only 120 megs ram? by KewlPC · 2004-02-21 20:05 · Score: 5, Informative

You realize that the onboard computer is basically the same one as used on the Mars Pathfinder lander, right? Same CPU, same amount of RAM, even the same OS. I wouldn't be surprised if they used the same (or similar) circuit diagrams for certain things.

The point is to use well known and well tested hardware. The whole point of Mars Pathfinder was to develop a system whose design could be re-used for other Mars landers and rovers.

Lastly, what exactly are you going to do with greater flash capacity? The point of having any flash memory on the rovers at all is not for long term storage, but rather just to hold onto data until it can be transmitted to Earth, after which it gets deleted.

Despite what some idiot posted a few posts up, they did NOT run out of room on the flash drive. Rather, the problem is more akin to running out of i-nodes. Mounting the flash filesystem, reading all its metadata and whatnot, took up more RAM than was allocated for it, due to the high number of files it had to deal with (most of which were accumulated on the way to Mars, and were going to be deleted).

The only reason nasa got it back to work by PipoDeClown · 2004-02-21 20:13 · Score: 3, Informative

is because when the batteries got drained the os went into a stable "safe mode" state. If they made a long lasting powersupply this project was doomed(.f) and they never found out what the real problem was.

What we can learn: by sakusha · 2004-02-21 20:14 · Score: 4, Insightful

It appears that we still haven't learned the biggest lesson of all. I still remember back around 1970, there was a big sign on the wall next to the IBM 370s at my university, written on a primitive pen plotter, it said:

Computers never make mistakes, they do exactly what humans tell them to do. All "computer errors" are human errors.

Re:What we can learn: by ColaMan · 2004-02-21 22:34 · Score: 2, Interesting

Unless you own an early pentium.

--

You are in a twisty maze of processor lines, all alike.
There is a lot of hype here.
Re:What we can learn: by Anonymous Coward · 2004-02-22 03:57 · Score: 1, Insightful

Wrong. That was a human error in designing the processor.
Re:What we can learn: by roman_mir · 2004-02-22 03:58 · Score: 2, Insightful

Really? What about external factors like a radiation spike that kills some of your hardware, will the computer do what you tell it to do then correctly?

--
You can't handle the truth.
Re:What we can learn: by sakusha · 2004-02-22 04:50 · Score: 1

That's not an error, it's a malfunction.
Re:What we can learn: by swillden · 2004-02-22 08:30 · Score: 1

Computers never make mistakes, they do exactly what humans tell them to do. All "computer errors" are human errors.
This is certainly true when the computer hardware is well-maintained, sitting in a climate-controlled machine room and running on conditioned power.
When the hardware has been loaded into a rocket, accelerated out of the Earth's gravity well, through the Van Allen belts, across 100 million miles of open space and crashed into the rocky surface of an alien planet where it runs off of batteries recharged at intervals by dust-accumulating solar panels... well, there are one or two other things that can go wrong.

--
Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
Re:What we can learn: by sakusha · 2004-02-22 12:52 · Score: 1

You still haven't learned the lesson. Those are not errors or mistakes, they are malfunctions. A properly designed computer system can easily detect malfunctions (i.e. ECC RAM) but that same system will happily execute any human-designed code containing massive errors.

You're just the kind of computer geek I abhor, always looking for excuses instead of solutions to your own mistakes.
Re:What we can learn: by swillden · 2004-02-22 14:04 · Score: 2, Informative

You still haven't learned the lesson. Those are not errors or mistakes, they are malfunctions. A properly designed computer system can easily detect malfunctions
Guess you'd better get over to NASA and set up a series of lectures so that you can impart your vast expertise and wisdom.
but that same system will happily execute any human-designed code containing massive errors.
Interesting that you point out the code as being human-designed. Who designed the hardware? God?
You're just the kind of computer geek I abhor, always looking for excuses instead of solutions to your own mistakes.
And you're just the kind of self-assured idiot who amuses me endlessly with your clueless but oh-so-confident assertions.
In the real world, hardware defects do exist, some designed into the hardware, others induced by external effects or damage. Software errors are certainly far more common, but that's mostly just because there's vastly more software.
Even without the effects of space travel, hardware contains flaws and, indeed, much of the job of low-level software is to work around those flaws. It's not uncommon for a significant percentage of the code in a device driver to be dedicated to working around various hardware defects.
Anyone who's spent considerable time working around custom and embedded computing hardware knows that defects often turn out to be *both* hardware and software-based. Insignificant hardware bugs interact with insignificant software bugs to produce major problems. Hardware defects aren't limited to those environments, either. Spend a little time searching the LKML archives for "ACPI" and reading what you find, or even just look through the Linux kernel configuration help and see how many configuration options you find that implement softare hacks to work around problems with particular pieces of hardware.
When you factor in the rather unique and harsh operating environment of this hardware and software, and consider the amount and depth of testing that certainly went into the development process, it's not in the least bit unusual that the programmers should be surprised that the flaw was purely a software error. If I'd been in those engineers' shoes, I also would have expected something far more complex. I'm sure they went into it, quite reasonably, assuming that some hardware component had failed and that they were going to have to implement a software workaround.
I'm sure the prevailing sentiment when they finally discovered the actual nature of the problem was "Hallelujah! This is something we can fix!", not "Uh, oh, I can't blame this on anyone else." That's certainly how I would have felt, anyway.

--
Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
Re:What we can learn: by sakusha · 2004-02-22 14:55 · Score: 1

They didn't listen back in 1977 when I was converting some of NASA's antiquated FORTRAN II programs to FORTRAN IV. They didn't listen at JPL either. What makes you think they'll listen now?

Spacecraft computer engineers today have the attitude that all hardware problems have been solved or dealt with through redundancy or hardening. I have often heard them say, "all problems are software problems." This wasn't a malfunction of hardware, it was a human error. And you still haven't learned anything.
Re:What we can learn: by swillden · 2004-02-22 15:44 · Score: 1

They didn't listen back in 1977 when I was converting some of NASA's antiquated FORTRAN II programs to FORTRAN IV. They didn't listen at JPL either.
Uh huh.
What makes you think they'll listen now?
Based on what you've said here? Not a thing, man, not a single thing.
This wasn't a malfunction of hardware, it was a human error.
Therefore it could not have been a hardware malfunction, and therefore the engineers should not have been surprised? Gotcha. Ain't hindsight grand?
And you still haven't learned anything.
Uh huh. Don't have anything of substance to say, I see. Come back when you have a point.

--
Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
Re:What we can learn: by sakusha · 2004-02-22 18:35 · Score: 1

Go back under your rock, little troll. You keep cranking out your buggy code, and I'll keep trying to educate the next generation of programmers how to avoid errors. The first step is admitting your fallibility. I have no use for idiots like you who think they're infallible. HAND.
Re:What we can learn: by ColaMan · 2004-02-22 21:06 · Score: 1

Perhaps you could more easily educate the next generation of progammers if you weren't such an uptight and insulting fellow with an apparent superiority complex.

Your original post :
"All "computer errors" are human errors."

To which people have replied , but what about hardware failures, etc.

To which you replied :
That's not an error, it's a malfunction.
Bah! Semantics. :-)

You're just the kind of computer geek I abhor, always looking for excuses instead of solutions to your own mistakes.
A pretty sharp response to a person who questioned your definition of "error". I fail to see how you managed to classify the poster as a computer geek you abhor, for crying out loud, in just one post.

And you still haven't learned anything.
You still haven't really taught anything. You have at this point however, been quite rude.

Go back under your rock, little troll..... I have no use for idiots like you who think they're infallible.
Now you're just being abusive here. Besides, it looks to me like you think you're infallible as well ;-)

How are you going to teach the next generation when in 4 posts you have shown that you're highly intolerant of people with different, albeit possibly wrong, views?

--

You are in a twisty maze of processor lines, all alike.
There is a lot of hype here.

Re:WindRiver's fault by KewlPC · 2004-02-21 20:17 · Score: 2, Informative

Actually, they used VxWorks because it was the same OS used for the lander on the Mars Pathfinder mission. Since they were using the same CPU and same basic computer design as the Mars Pathfinder lander, they probably figured, "Why not use the same OS?"

Re:NASA should have simulated... by gentoo_is_bogus · 2004-02-21 20:21 · Score: 0

Of course they tested it! Something is not quite right about this strory though. How could such a seemingly simple problem have been missed?

--
-- Exposing the hype of Gentoo zealots. Modded into the ground to suppress opinion.

Re:Ran out of flash disk space. No, really. by SiliconEntity · 2004-02-21 20:28 · Score: 2, Informative

Here's what happened according to the article. They launched the ship with an OS image in flash, and soon realized that they needed to update it. So shortly after launch they sent another complete OS image. They knew they'd have to delete the first image, but they didn't do it right away. At that point there was plenty of room in the flash memory so having two OS images was not a problem.

After a few days on Mars, they were starting to fill up the flash, so they planned to go ahead and delete the old launch OS image, its directories and files. This is a complicated process so they uploaded a special program to do it on Sol 15. And apparently they informed the rest of the team that the memory would be free and available after that point, so the rest of the team made plans to start filling it up with pictures.

However, the upload on sol 15 failed, and was rescheduled for sol 19. Now, here's the big mistake (which the article glosses over): They forgot to tell the rest of the team that all that memory wasn't going to be freed up as planned, not for a few more days. So instead, Spirit is moving around now, taking lots of pictures, storing them in flash, and all the people involved with that think they have plenty of room. Little do they know that they are running out of flash space. Finally, the morning of Sol 19, shortly before the memory cleaning program was going to be sent down, it happened. The flash memory was exhausted. This triggered a sequence of events which put the craft into a failure loop.

The big problem here, then, was the failure on the part of the group which was supposed to clean out the launch OS image to tell the rest of the team that it wasn't going to happen as scheduled, so the memory wasn't going to be available. It wasn't really Murphy's Law, but rather a failure to communicate among the team. This is an institutional problem which will hopefully be fixed.

1991 Escorts. by Mustang+Matt · 2004-02-21 20:39 · Score: 0, Offtopic

There are lots of 91 escorts on the road. Pieces of junk? Yes, but they are still running.

--
The man who trades freedom for security does not deserve nor will he ever receive either. - Benjamin Franklin

Very Little Transferable by severoon · 2004-02-21 20:51 · Score: 1

I wonder if the guy that put this article summary up read the article. They very clearly stated therein what the difficulty was and the unique confluence of events--specific to the rover's hardware and OS architecture--that led to the shutdown. What on earth (or on Mars) could we possibly take away from this experience that would lead to some ability to troubleshoot systems remotely?

I just don't see it...and I'm in computers for a living.

sev

--
but have you considered the following argument: shut up.

Re:NASA should have simulated... by garbagedisposal · 2004-02-21 20:53 · Score: 1

yer seems fishy to me

You are correct!!!! by Anonymous Coward · 2004-02-21 20:53 · Score: 0

>>> I know next to nothing about progamming

Absolutely correct.

Re:Ran out of flash disk space. No, really. by dorko · 2004-02-21 21:15 · Score: 4, Informative

[T]hey are running out of flash space. ... The flash memory was exhausted.

No, no, NO!

It was the inability to build the RAM-based directory structure of the files in the Flash memory.

Why couldn't they build the directory structure? They had too many files, the size of the files doesn't matter here, only the number of files.

In other words, they ran out of RAM, not Flash.

Exercise left for the readers: Why can a Unix file system that is out of inodes have much less than 100% disk usage and still not be able to create a file?

Cars are like socks, disposable by Anonymous Coward · 2004-02-21 21:32 · Score: 0

Why do car interriors turn to
shit after 5 years?

Why do repair costs
start to outweigh buying a new car?

Why do car manufacturers offer buy backs?

The reason is simple.
The entire car industry depends on
people buying new cars every 5-10 years,
That's why cars are only really made to last 5 years, and that's why warranties only cover a car for up to 100 thousand miles.
It's nothing short of a conspiracy.

Re:WindRiver's fault by DA-MAN · 2004-02-21 21:38 · Score: 1

Actually, they used VxWorks because it was the same OS used for the lander on the Mars Pathfinder mission. Since they were using the same CPU and same basic computer design as the Mars Pathfinder lander, they probably figured, "Why not use the same OS?"

Actually they use VxWorks because WindRiver gives JPL major discounts...

--
Can I get an eye poke?
Dog House Forum

Re:WindRiver's fault by KewlPC · 2004-02-21 21:45 · Score: 3, Insightful

WindRiver may give JPL large discounts, but I doubt that's the only reason VxWorks is running on the MERs.

Years ago, when JPL was designing the Mars Pathfinder mission, they asked Wind River to do an "affordable" port of VxWorks to the RAD6000 (a radiation-hardened RS6000), and they agreed. Since the computers on the two MERs are very similar to the computer on the Mars Pathfinder lander, it makes sense that they'd use the same OS that they used on the MPF lander.

I would think the fact that JPL knows VxWorks very well by now would be a major factor in deciding to use VxWorks for the MERs.

ah-ha! by MasTRE · 2004-02-21 22:18 · Score: 1

> "We recognized early in the planning process that the flash file system had a limited capacity for files."

Wow, geniuses. As opposed to the regular file systems we use here on Earth, which have unlimited "capacity" for files?

> "But there were also directories of files already placed into the file system in the launch load,"

More advanced high-tech speak.. NOT!

All joking aside, when they try to make it easy-to-understand for laymen (or politicians?), they make it sound retarded to us.

I think the problem was actually with the imperial-to-metric conversion functions, they're just covering it up to avoid further embarrassment ;)

--
Must-not-watch TV!

Re:ah-ha! by tigertiger · 2004-02-22 06:09 · Score: 1

think the problem was actually with the imperial-to-metric conversion functions, they're just covering it up to avoid further embarrassment ;)
it's just that:
1 imperial MB = 1024*1024 bytes
1 metric MB = 1000*1000 bytes
oops :-)

Pre-91 escorts still going strong. by King_of_Prussia · 2004-02-21 22:29 · Score: 1

I am currently considering buying a 1980 MkII Escort. 118,000 k's on the clock, and it still goes sweet as. One of my friends drives an '85 that has no problems either.

--

Making the moon less necessary since 1998.

JPL by EachLennyAPenny · 2004-02-21 22:49 · Score: 3, Funny

The JPL is a pretty viral license. It forces you to spread their space probes from your planet to all your customer's planets. This is un-solar systematic! What's next? Calling GNUpiter Jupiter instead?

Re:NASA should have simulated... by hyc · 2004-02-21 22:54 · Score: 2, Interesting

One word: outsourcing.

When I worked at JPL, every 6 months to a year there'd be talks of layoffs because the headcount was too high; people would leave and return to the same projects as contractors, then get a higher hourly wage for doing the same work with less accountability.

The whole reason for that lost probe (feet vs meters, anyone?) was because of a political squabble between two teams (one JPL-internal, one outside contractors as I recall) who simply failed to cooperate productively. The whole management structure inside that world is screwed. People's project leads are not the same as their section/department leads, so the reporting chain is a mes{h,s}. Time and energy is wasted in contract(or) management, all in the name of "reduced costs" even though having all the work done in-house would eliminate a full layer or two of mid-level management waste.

NASA/JPL are totally hamstrung by beancounters who think they're saving the public's money, but truly can't see the big picture, missing the forest for the trees. (Either that, or they *do* see the big picture, and are busily lining their own pockets with the excess that gets tossed around thru all the churn.)

--
-- *My* journal is more interesting than *yours*...

Re:Ran out of flash disk space. No, really. by Mal-2 · 2004-02-21 23:01 · Score: 2, Insightful

Could this have not been said more succinctly with a simple quote? Namely:

"What we have here, is failure to communicate."

Mal-2

--
How is the Riemann zeta function like Trump rallies? Both have an endless number of trivial zeros.

Hmmmm by ziggy_zero · 2004-02-21 23:07 · Score: 3, Insightful

"The irony of it was that the operating system was doing exactly what we'd told it to do"

Funny, that's how it was explained to me by my computer science teacher my freshman year in high school. He said, "The problem with computers is that they do exactly what we tell them to."

--
I belong to the ______ generation.

Discovered a system log ? by thrill12 · 2004-02-21 23:31 · Score: 3, Interesting

"We discovered a system log in which the problem was documented,"
Those guys are running a very expensive experiment, are logging it and they have no idea what and where they are logging??

--
Slashdot: stuff for news, nerds that matter, matter for news, stuff that nerd

Re:Discovered a system log ? by heneon · 2004-02-22 00:29 · Score: 2, Funny

Those guys are running a very expensive experiment, are logging it and they have no idea what and where they are logging??
When building the rover, they probably just put vxworks installation cd in, and selected "Typical Install for inter-planetary missions", clikced "Next" a couple of times and got the OS running in no time.
Now, if the NASA engineers are anything like me, they didn't bother to check what was being logged and where... after all, when a problem arises, then you can go to /var/log/ to see if there's anything interesting.
Re:Discovered a system log ? by FrostedWheat · 2004-02-22 00:52 · Score: 1

They had no control over what the rover was sending them. If I remember correctly they started getting unrequested engineering data. Perhaps they 'discovered' the logs with the problem among that data.

I would imagine that the people who put together that system know it inside out. They have to!!
Re:Discovered a system log ? by roskakori · 2004-02-22 01:23 · Score: 2, Insightful

maybe the guy really is from PR and doesn't know how to carefully phrase sentences targeted at a technical audience, but these also hit my eye:
"It was recognized just after [the June 2003] launch that there were some serious shortcomings in the code that had been put into the launch load of software," said JPL data management engineer Roger Klemm.
i know this is common within the software industry, but if this happens on such a project, it looks like plain incompetence to me.
Klemm said that with the leftover directories and their files removed, the system is now functioning well. But just in case, the team is working on an exception-handler routine that will more gracefully recover from an allocation failure.

allocation errors are the easiest to predict. even if you don't handle them gracefully (which often can be near to impossible), most of the time you can log them. of course, a reliable, redundant log facility is one the most crucial components of such a system...
writting this from my armchair, of course i can't really judge their competence and claim i could have done better. still, the article makes me suspicious.

Parent should be modded down by Dan+East · 2004-02-22 00:19 · Score: 2, Informative

I did read the article, and my comments are completely accurate. Unfortunately you must not have made it to the 3rd paragraph, and neither did the mods that modded you up and me down.

The problem was discovered after launch. The first few fixes made the problem worse by stressing the filesystem even further.

It doesn't matter that they were trying to fix the problem. THAT WAS NOT MY POINT. The problem should have been identified and fixed before the craft was launched.

Yes, they may have taken "around" 100000 pictures. Does that mean they sequentially stored every picture in an actual rover file system? I get the impression they were only testing the cameras or the capture software, not the holistic system.

Did they first simulate filling the filesystem with files generated during the actual trip to mars? Apparently not, because the system would have failed if they had actually put the rover software through a launch to end of mission simulation here on earth when the software was developed.

Dan East

--
Better known as 318230.

Re:Parent should be modded down by Anonymous Coward · 2004-02-22 02:53 · Score: 0

Wow, you're a whiney little bitch. Suck it up and move on, pee-wee.
Re:Parent should be modded down by KewlPC · 2004-02-22 07:55 · Score: 1

No, they would not have stored each of those 100000 images in an actual rover filesystem, because the rover is not meant to hold hundreds of thousands of images, and the test cameras were probably not attached to a rover. And yes, the point was to test the cameras themselves; to see how they'd respond under different conditions.

And they did a test mission with a test rover out in the desert. Basically, the put a test rover out in the desert somewhere and had the team go through it as though it were a real mission.

Lastly, you don't seem to realize that there's a limited launch window for going to Mars. If they had missed it, they'd have to wait two years before they could launch again.

Whether or not they knew about the problem before launch is kind of irrelevant. They'd have 6 or 7 months to work out the kinks in the on-site mission software while the rover was in transit to Mars. They knew about the problem before it actually became a problem, and had implemented a plan to fix it, but due to the weather at the Australia (IIRC) Deep Space Network site they weren't able to fully upload the utility meant to fix it.

And yes, according to the article it had occurred to them that the utility might not get completely uploaded due to weather.
Re:Parent should be modded down by Minna+Kirai · 2004-02-22 08:56 · Score: 1

And they did a test mission with a test rover out in the desert. Basically, the put a test rover out in the desert somewhere and had the team go through it as though it were a real mission.

Evidently they did not do that, or the bug would've been discovered.

but due to the weather at the Australia (IIRC) Deep Space Network site they weren't able to fully upload the utility meant to fix it.

It's acceptable that radio propagation may cause some messages to be lost or improperly recieved. But to not recognize that the command was incomplete and to execute it anyway is a gaping, fundamental system flaw.

Rule #0 of any remote digital control is that the recieving system must never execute any command without first verifying that the command message was recieved completely and correctly. (It's as easy as matching an MD5 sum!) To have ignored that basic software principle is a painful embarrasment to all US taxpayers.
Re:Parent should be modded down by KewlPC · 2004-02-22 09:59 · Score: 1

I don't think it was quite like that. The way I interpreted the article was that the utility was broken into two separate programs. The second one didn't get uploaded.

Anyway, I think the problem was that the utility didn't get run, and the resulting problems occurred because the filesystem didn't get cleaned up. I really doubt that NASA would be so stupid as to have the rover blindly execute whatever they sent it without some form of verification (at least a CRC32 or something).

And yes, for the last time, the DID do a test mission in the desert with a rover prototype. For the life of me, I can't find the link, but if you Google for it (I think the prototype rover was called Fido), I'm sure you'll find something.

Just because they did a test mission doesn't mean the bug would've been discovered. According to the article, most of the junk clogging up the filesystem was leftovers from software upgrades.

Meet my 1981 Escort by dk.r*nger · 2004-02-22 01:16 · Score: 1

Although dead now, it was alive at its 21st birthday.

A mighty fine piece of hardware. Might still have been running, had I not driven it into another car.

I don't think so... by twoslice · 2004-02-22 01:20 · Score: 0

Maybe it's all lies and the Martians hit Ctrl+Alt+Del...

They couldn't even find the start button. Arnie had to turn it on for them many years later and it was only a single button. I don't think that the Martians figure out three buttons...

--

From excellent karma to terible karma with a single +5 funny post...

Moderators? by Anonymous Coward · 2004-02-22 01:20 · Score: 0

Uh, the parent post is correct and modded down. Then in this same thread there is a +5 post that is totally wrong. Hello?

Engineers are some of the worst programmers. This is especially true when it comes to testing. Ugh.

Who would not have tested the file system filling up?! I mean, it's not like you'll have easy access to the device if there's a problem. You need to test everything you can. Testing something as simple as filling up the file system is routine. Unfortunately this is typical of the software from many programmers, especially engineer types.

Logging should not be limited ? by thrill12 · 2004-02-22 02:07 · Score: 3, Insightful

Seriously, from a developer viewpoint, that is all wrong.
I have worked on projects in which there was simply too much logging going on that you couldn't tell head from toe anymore. When a problem arrived, scanning the logfiles proved very cumbersome indeed. Every developer had his own stuff logged, which sometimes proved interesting, sometimes proved utter crap (noone wants to know variable XYZ is increased by 1 for 24943 times).

You should develop a well-thought logging strategy that increases the logging verbosity on a problem-basis, not simply log everything that happens and hoping you get some useful information.

--
Slashdot: stuff for news, nerds that matter, matter for news, stuff that nerd

Re:Logging should not be limited ? by ZorbaTHut · 2004-02-22 06:52 · Score: 1

On a project I did a year ago, we had a few simple rules.

You don't have to justify logging anything in particular. However, any default logging must go to the console (GUI program, so this didn't disturb usability). Plus, we were running on a platform with really glacial console output.

The end effect was that any annoyingly large amount of logging would bog down the program so much that we'd figure out whose logging it was and go kill them. It worked quite well :) I ended up with maybe a hundred lines of logging per invocation, at most, with another few hundred at most in an algorithm area that was entirely mine (meaning that nobody else needed to decipher it.)

--
Breaking Into the Industry - A development log about starting a game studio.

Not lost forever by PeekabooCaribou · 2004-02-22 02:25 · Score: 2, Funny

Not lost forever, but lost until we travel to Mars and retool it as an extraterrestrial barbeque grill.

--
"I'll say it again for the logic-impaired." -- Larry Wall.

Re:Not lost forever by StarfishOne · 2004-02-24 05:39 · Score: 1

extraterrestrial barbeque grill? On those solar panels? :) Hmm ..but with some water -> hydrogen.. nice cookin' DOH! Now the real reason why they are so eager to find water hits me ;-)

Good posts! by electromaggot · 2004-02-22 02:53 · Score: 2, Interesting

...and I'm not saying that just because we agree. Yours are good additional insights (hence your "insightful" mods up! :-)

I agree with the reply-post below too, saying that if they'd made their system a bit more fault-tolerant, then the problem might have been more easily recovered from. Sixty reboots in a row in a day seems a little excessive! Don't they have counters to detect that very thing? Don't they have a failsafe/debug OS burned into ROM (not flash) to load automatically in just such an event? Such are the risks when you're reloading a whole new OS remotely!

However, maybe they do have such things, or equivalent. I don't think their method of recovery was "accidental" (or a hack) either, although I'm making assumptions and I haven't seen their spec. The key is that they recovered from the error... and I now assume that they have recovered completely.

What I found interesting was NASA's initial assessment that the flash ROM was failing -- a hardware failure. The media jumped all over that and reported it, so the rest of us were thinking, "Great, the rover is crippled and will never be the same. :-( "

Now, turns out it was just a software error. Where's the mainstream media now? ("EE Times" is hardly mainstream!) Can the rover's recovery now be considered a "complete recovery"?

If this story goes mainstream, will it make NASA look bad for screwing up... or look good for making a full recovery? I'm not sure. (Of course, smart people make mistakes too, lots of them, but the key to being smart is covering your ass beforehand! :-)

The verdict is clear by Anonymous Coward · 2004-02-22 02:57 · Score: 0

They were sloppy.

The Poor Bastard. by Anonymous Coward · 2004-02-22 02:57 · Score: 1, Funny

Just imagine the poor bastard who had to carry the tomsrtbt floppy all those millions of miles up there, and stick it in the floppy drive!
Well, we all have to do stuff like that every day, and God help us if we run out of Magic Wands!

Not EVERYTHING... by Chemisor · 2004-02-22 03:08 · Score: 1

You might not be able to recover from every possible thing that might ever go wrong, but there is no excuse for not checking for a file creation error. If there is only one error you check for, it should be that. And there is no excuse for dying after not being able to create a file either. You should simply report the error and return to idle.

Except just one thing: by Chemisor · 2004-02-22 03:11 · Score: 3, Insightful

> What on earth (or on Mars) could we possibly take away from this experience?

Rule 3: Never ignore the return value from open.

Translation by Gothmolly · 2004-02-22 03:22 · Score: 1

Hi, I didn't RTFA, and I don't have much to say about the Mars rover, so I'll spew some crap about wear-levelling, cuz I heard my friend's friend talk about it on his 'puter. I have a digital camera, you know, so I'm an expert on flash memory.

cat /dev/clue > Mr2cents
error: No space left on device

Dude, take a look at jffs, it has built in wear-levelling and is used extensively by those of us with a Zaurus, and probably 100's more appliances as well.

--
I want to delete my account but Slashdot doesn't allow it.

Re:Translation by Anonymous Coward · 2004-02-22 07:52 · Score: 0

1) I did read the article (you obviously didn't).
2) I read it again after reading your troll post to see if I missed something, but it didn't mention the questions I have.
3) I don't have a digital camera.
4) The knowledge I have comes from working with a Rabbit 2000 microcontroller.
5) I never claimed to be the ultimate expert, most probably you are mistaking me with MrExpert.
6) I already have a clue, but thanks anyway, mister I-own-a-zaurus-and-you-don't.
Re:Translation by swillden · 2004-02-22 08:19 · Score: 1

JFFS and JFFS2 are implemented only for Linux. The Rover does not run Linux.

--
Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.

Puhleese by Anonymous Coward · 2004-02-22 03:25 · Score: 0

"Repair costs for things like Cadallacs and BMW's are not cheap for TCO! "

Take a look again about BMW's in CR. They're generally the top rated car in their class, and they have average to above average warranty.

Hondas are good; I've owned 7 or 8. But BMW's are better. Hondas you can drive for 15 years. BMW's... you WANT to drive for 15 years.

That's the difference.

I've owned my BMW for 3 years and its has been absolutely flawless. Good gas mileage, excellent design, it delivers a driving experience unmatched by *anything* on the planet.

Why you think BMW's are the poster child for bad design and manufacturing process is a mystery.

Reprogramming space probes... by MikeyToo · 2004-02-22 03:57 · Score: 1

isn't an uncommon thing really. I can't find the article now, but I read something about Cassini (the probe to Saturn) a couple of years ago that said that the flight software for the orbital mission wasn't even written yet.

I wish that my computers contained a backup copy of the baseline OS somewhere. There have been a few times that I wished I could just flush them quickly. Yeah, I know. Quit using Windows, right? I will when I can get games I like on *nix.

--
"Well Ranger Brad, I'm a scientist. I don't believe in anything." - Dr. Roger Fleming

you, too, can have this capability on earth... by sommerfeld · 2004-02-22 03:58 · Score: 4, Informative

It's not that hard to pull off off this sort of seemingly amazing remote recovery with pure off-the-shelf tech if you plan for it in advance and are willing to pay a modest premium.

You need remote serial console access -- ideally including firmware/bios serial console access -- and remote power cycling, controlled by a small embedded system, either in separate units (APC masterswitch, terminal servers) or as part of the system unit (common on Sun gear as "LOM"/"ALOM"/etc.; some of this is also creeping into x86 mobos). All this lets you regain control of the system remotely.

Then it becomes a matter of hardening the system to let you recover from various other insults. Never let go with both hands: Mirrored disks (protecting against hardware failure) and multiple bootable partitions (protecting against software or human error) can both be used; netbooting is also a nice capability to have when you've got a bunch of servers in the same place.

Disclaimer: I bet you can do much of the above with other people's gear, but I work for Sun and I know it works for me...

Re:Ran out of flash disk space. No, really. by Mister+G · 2004-02-22 04:00 · Score: 1

Welcome to the world of users-as-entropy-pool.

Lessons to learn by roman_mir · 2004-02-22 04:04 · Score: 1

1. Get 900 million USD.
2. Hire a team of rocket scientists.
3. Build duplicates of all equipment, let QA team run tests for a couple of years.

Yeah, I learned something, thanks!

--
You can't handle the truth.

To misquote the guy near the start of Total Recall by Anonymous Coward · 2004-02-22 04:24 · Score: 0

"Blue screen on Mars? That's original!"

Debugging? More Like De-Lousing... by etLux · 2004-02-22 04:26 · Score: 0

Hah! It may have been a hardware problem, but the "bug" involved was... yes, it's absolutely true... a Martian relieving itself on one of the Spirit motherboards: reported here. Instead of gee-whiz technology, what they really need is anti-whiz -- or maybe just some kitty litter.

Launching with incomplete code is common by rarose · 2004-02-22 04:33 · Score: 4, Interesting

The enroute time for Cassini to get to Saturn was 7 years; rather than push back an already long mission they launched with feature-incomplete code. They knew they had 7 years to get the software fully functional and debugger; they've updated it remotely from millions of miles away a number of times now.

I'm sure the rovers did the same thing... Develop the launch/cruise software before you launch (and of course try to get as much of the entry/landing code done as you can!), and then uplink the final code before it's needed. Therefore it doesn't surprise me one bit that the JPL engineer knew there were shortcomings in the launch software.

Hell, I develop BIOS for servers and we do it all the time. The BIOS image we give the hardware engineers for initial bringup is usually *way* short of features that will be there when it actually gets used by the customers!

--
--Rob

Rover software team to blame by Anonymous Coward · 2004-02-22 04:45 · Score: 0

Wow, you're a whiney little bitch. Suck it up and move on, pee-wee.

It must suck to be wrong all the time and resort to childish comebacks like the one you just made.

The Mars Rover software team was lucky that their complete incompetance did not cost them the mission.

Yeah but the rovers are cooler by tjstork · 2004-02-22 04:46 · Score: 1

The govermnent already funds way more medical research than does NASA. Health care in the United States costs more than a trillion a year. If they can't deliver all of these miracles you promise with the money that they have, they can't do it all.

Testing everything on the ground is silly because you cannot duplicate either Mars or Deep Space on earth. NASA didn't get lucky - they did it right by design. If you can reprogram the craft while it is in flight, and have a robust capability to do so, then that is way more useful than a Mars simulation on the ground.

--
This is my sig.

Re:Yeah but the rovers are cooler by idiot900 · 2004-02-22 07:27 · Score: 1

The govermnent already funds way more medical research than does NASA.

NIH's current budget is about $29 billion (NIH funds the vast majority of biology/medical research in the US). NASA's current budget is about $15.3 billion.

Not Insightful, just wrong by Anonymous Coward · 2004-02-22 04:51 · Score: 0

Apparently you read the article but missed the point entirely - their programmers and testers screwed up, plain and simple. They never anticipated this scenario and they were just lucky that it was remotely fixable. The programmers are not the 'heroes', but fortunate bumbling fools making a very junior QA mistake.

Re:only 120 megs ram? by DrDNA · 2004-02-22 05:15 · Score: 1

What is the optimal amount of RAM and flash to have? As much as possible is not the answer. The more you have, the more you increase your chances of getting corruption from radiation. Enough to do the job is probably the answer.

What I would like to see is redundant RAM and flash to avoid corruption. Something like a RAID disk array, but for RAM and flash.

If I have seen further, it is because I have stood on the toes of giants.

Re:Ran out of flash disk space. No, really. by SiliconEntity · 2004-02-22 05:33 · Score: 1

It was the inability to build the RAM-based directory structure of the files in the Flash memory.

Fine, but the point remains that it was a major screwup on NASA's part. They never communicated the fact that the memory-clearing download had failed and therefore the flash was still full of many more files (from the original launch OS image) than the rest of the team thought. That's why the flash was allowed to fill up (in terms of file count or data, it's not important). The engineers are very careful not to exceed the limits of their flash file system, but they were misinformed about what those limits were, since they had not been told that the erasure download had failed.

It was not Murphy's Law, it was not a fail to simulate or verify or test the software. It was a simple internal communications failure, where the right hand didn't know what the left hand was doing. The article lets NASA off too lightly by not emphasizing this failure (which is ultimately management's fault for not making sure that everyone on the team is aware of all crucial data).

Space Communications Protocol by ChrisCampbell47 · 2004-02-22 07:33 · Score: 1

I don't think they would bother using anything to do with TCP ... If it has anything to do with current internet protocols, it would be UDP.

Space Communications Protocol Standards (SCPS)

http://www.scps.org/

http://www.scps.org/Documents/SCPSoverview.PDF

--
One simple rule for its versus it's

If they had used Ninnle... by Anonymous Coward · 2004-02-22 08:02 · Score: 0

...as I suggested before, they wouldn't have to reboot at all. The rover never would have been lost! They have to stop using DOS for all their spacecraft!

Re:NASA should have simulated... by Kashif+Shaikh · 2004-02-22 08:26 · Score: 1

"The fact that they filled up the flash memory with too many files that were accumulated during the cruise phase of the mission between earth and mars was something that they should have known would happen."

How easy it is to describe a test after a problem appears. Not so easy when there are a million different things you want to test and some untested script statement(i.e. cp $EARTH_FILES /tmp/blah) failed _silently_.

even worse to test and detect are periodic restarts due to some constant input

Night Pictures!!! by io333 · 2004-02-22 08:27 · Score: 1

Wow. That just set my mind spinning! Are there any pictures of what mars looks like at night? If not, how can we ask NASA to take some?

Re:Night Pictures!!! by FrostedWheat · 2004-02-22 10:09 · Score: 1

It would look the same as it does on Earth (far away from any light polution).

The stars would not twinkle as much, Earth would probably be a very bright planet and Jupiter would look bigger and brighter. Mars moons are very small and fairly close to the planet so you'd rarly see them.
Re:Night Pictures!!! by Ignominious+Cow+Herd · 2004-02-22 12:56 · Score: 1

Dark! Really, very dark. Unlit. Dark Rust-Red actually.

Next question.

--
Lump lingered last in line for brains, and the ones she got were sorta rotten and insane.

Re:NASA should have simulated... by cybpunks3 · 2004-02-22 08:56 · Score: 1

Doesn't VxWorks have a way to run an automatic scheduled task?

Why would this task have to be manually sent all the time?

easy... by i+chose+quality · 2004-02-22 08:57 · Score: 1

Are there lessons here that we can use here on the third rock for recovery of our messed up machines which we manage from afar via ssh?

you can start by not sending it to mars... that is the lesson.

--
the computer is online
i am not at it
what a waste of ressources

Re:WindRiver's fault by Anonymous Coward · 2004-02-22 09:07 · Score: 0

Off the record, the main reason for JPL not transitioning away from VxWorks is budgetary. They do not have the budget to switch to something else, despite the engineers feeling that some other OSes would be superior for their needs. Think about how much time and money it would take to retool, retrain, recode and reverify using a new OS.

Incidentally, on the record, JPL doesn't use Wind River's toolchain (i.e. Diab etc.) for Spirit; they use compiler/debugger/etc. from Green Hills Software, Inc. Only the OS itself is Wind River. I don't happen to have any links to back that up ATM but that at least has been stated publicly.

eh by bmajik · 2004-02-22 09:47 · Score: 1

anybody not factoring lifecycle costs into a car purchase is a dummy. But there are more variables than just the brand of automaker.

I've owned 2 BMWs, an Audi, a VW, a Ford Bronco II, a Toyota Celica, and a 1970 Oldsmobile Cutlass.

The most reliable of all those cars ?

The 1970 Oldsmobile. It did nothing. There was nothing to go wrong. Well, nothing until i blew up the 27 year old engine. Threw a rod through the block. Paid some guys $900 to do a motor swap. Reliability went to shit after that.. but once it was running it ran great (4bbl carb in the 13 year newer motor)

The Ford Bronco was the worst i think. It was a hodgebodge of hacked up stuff by the time I got it.

The Celica was rock solid but I let a friend borrow it and she reduced the clutch to literally nothing.. stranding her on an onramp. Probably not the cars fault, but it's a knock against its perceived reliability. I was able to drive the thing no problem (and very hard, i might add) so i dont know how she managed to put the car in an undrivable state in just a matter of hours.

I did the clutch replacement myself on that car. I am never again owning a transverse engined FWD japanese car. Working on them is pure bullshit.

I think your position that vehicle TCO should trump looks or other factors is silly. It's up to each person to decide whats important to them.

For instance, i'll likely never own a future toyota or honda product because with the exception of the MR2, Supra, S2000, and NSX, they make FWD economy appliance cars that are at best uninspiring to drive and at worst unsafe (inadequate brakes, woeful suspensions, inadequate acceleration on automatic 4 cylinder models, etc). The models i've excepted are rarities in their model range (and have higher maintenance costs than run of the mill honda/toyota vehicles)

If you were to make TCO the only factor in a car purchase, you'd ride the bus. Owning and insuring a car is an expensive proposition regardless of what you buy. And the car's you're talking about are nothing more than appliance transportation. When i get in a car, the drive is the point, not the destination. Driving in an uninspiring car is torture for me, so when i lived in seattle i'd often take the bus even though we had 3 cars. When you factor in the joy of driving, or unfortuneately, the desire for worldly status, it becomes clear why people buy BMWs - they are subjectively and objectively more fun to drive than Camrys.

There are other concerns - i.e. honda and toyota do not offer station wagons at all, much less station wagons with manual gearboxes. VW does, so now we have a VW. Yes, it will probably end up being more expensive than a honda accord. It also fits the requirements; an accord does not.

BTW - my Audi and BMWs are from the 1980s.
My first BMW as a 1980 528 5 speed. I bought it with 220,000 miles on the original drivetrain, and drove it _Very_ hard and sold it with 240k miles on it. The second bmw is an 88 model with 111k miles on it (i bought it with 98k). I drive it extremely hard (it has a 6900 rpm redline that i am very familiar with).

The Audi is an 88 model that i just bought with 192k miles on it (now has 196k)

German engines are notoriously bulletproof and well made. The 88 BMW uses a detuned race engine of which less than 5000 examples exist in North America. Yeah, that one is expensive to maintain an work on, but it's also more motor than anything you can get from honda, even today (except for the NSX motor). And its 15 years old (and a 25 year old design).

--
My opinions are my own, and do not necessarily represent those of my employer.

QNX by xtal · 2004-02-22 09:56 · Score: 1

www.qnx.com - free beer demo

I suspect the reason you don't see QNX used more is that it isn't American, it's Canadian, but maybe that's just the canuck in me coming out. QNX is a really interesting OS, and no, I don't have any affiliations with the company.

(RT) Linux is a long way from true RTOS performance, or at least it was when I last looked at it.

--
..don't panic

They do! (At least for some projects) by voodoo1man · 2004-02-22 10:39 · Score: 1

For example, critical parts of the Remote Agent autonomous spacecraft system (which flew successfuly on the DS1 mission) were verified using SPIN. Unfortunately, the team did not have enough resources to verify all of the system, and although they found bugs in the parts they did analyze (most (all?) of these were race conditions), during flight one of the parts of the system they didn't verify (but which was thought "safe") caused a race condition. One of the team members talks about it in a USENET post.

Another interesting thing about the RA experiment is the way the error was found and fixed. Because the RA was written in Lisp, it had interactive debugging and loading features, and the race condition was diagnosed without having to stop the experiment, and patched without having to reload the whole system. The same project team member (Erann Gat) said it "proved invaluable in finding and fixing the problem."

--

In the great CONS chain of life, you can either be the CAR or be in the CDR.

Re:NASA should have simulated... by gentoo_is_hyped · 2004-02-22 10:55 · Score: 1

Scary to think of all that hardware flying around up there run by a dilbert culture!

--
[Gentoo is hyped. Modded into the ground to suppress opinion]

skill or luck? by Nykon · 2004-02-22 14:01 · Score: 1

# ping rover.nasa.gov
Request timed out

# ping rover.nasa.gov
Request timed out

# ping rover.nasa.gov
Request timed out

# ping rover.nasa.gov
PING rover.nasa.gov (192.168.1.143): 56 data bytes
64 bytes from 192.168.1.143: icmp_seq=0 ttl=50 time=6043.446 ms

--
"It's better to be a pirate then join the Navy"

You by Anonymous Coward · 2004-02-29 15:15 · Score: 0

are a fat dicklicking faggot. HUAGLUALAUGHLAUGH, HTH, HAND. Don't use so many caps.

390 comments