UNIX Process Cryogenics?
shawarma asks: "Due to a recent
power outage, I've had to shut down a server running a process that had
been running for ages calculating something. The job it was doing would
have been done in a few days, I think, but I had to shut it down before the
UPS ran out of juice. This got me thinking: Why can't I freeze down the
process and thaw it back up at a later time? It ought to be possible to take
all the connected memory pages and save them in some way, preserve file
handles and pointers, and everything. Maybe net-connections would die,
but that's understandable. Has any work been done in this field? If not,
shouldn't there be? I'd like to contribute in some way, but I think it's a bit
over my head.." Laptops have been doing this in some form for years:
most laptops, when they run out of power, or when told by the user will
go into "suspend" mode which is similar to what the poster is describing,
however outside of laptops, I haven't seen this done. Sleeping processes
also do something similar, sending their memory pages into swap so other
running processes can use the memory. What, if anything, is preventing
someone from taking this a step further?
External dependancies might include open files (what if you freeze, and then delete the file?), open TCP sockets to daemons elsewhere that wouldn't get frozen, sub processes, etc... These would probably have to be revived, but how?
This sounds like common sense to me. You never know when the disk is going to poop, the power shut off, the network reset.
At my old job, we were required to record the status of all jobs that took longer than an hour (on a 6 cpu SGI). They never crashed on their own, but I would usually interrupt them if the requirements changed or whatever. If they ever did crash, then there was a record of exactly where they left off.
Any program that you intend to run for more than a day or two you should checkpoint its intermediate results to disk, even if this adds 100% to the run time.
--Blair
P.S. Alternatively, you could write a program to have the rebooted computer pull scrabble tiles from a bag structure and print them to the screen. You might at least get some clue as to whether it was asking the right question.
The comments to the effect of "it's called hibernation, and has done it for years" are missing the point. That hibernation is a BIOS supported dump to disk. It's a feature on most laptops and works with just about any OS -- it's worked on my Linux laptop for years.
/var/longoperation.pid`
I think the feature to be discussed is Operating System (not BIOS) level support of the hibernation of a single process. It'd be nice if I could do a:
kill -HIBERNATE `cat
and have that program get frozen to disk. Then if I could resurrect just that process later it'd be a handy feature for the long running program that you want to postpone until after you've done whatever you needed to do in single user mode.
There are more than power problems to worry about with a long running process. There are other hardware failures, scheduled downtime, and system crashes to contend with. Just becuase in this instance it was a power failure that made him wish he had this ability doesn't mean it wouldn't be useful in other circumstances.
Why are software techniques shit today compared to yesterday?
Because we're hopeless caught up in trying to reinvent a somewhat limited computing paradigm (unix). No one, except for some CompSci projects that never really go anywhere, have any real interest in making a new operating system that builds on the lessons of all the previous operating systems and includes reasonable features like process checkpointing/suspension.
I'd bet there are patent considertions as well -- maybe many of the good OS features are not reproducable due to existing patents.