Slashdot Mirror


User: Eric+Roman

Eric+Roman's activity in the archive.

Stories
0
Comments
1
First seen
Last seen
Profile
(view on slashdot.org)

Comments · 1

  1. Checkpoint/Restart for Linux on UNIX Process Cryogenics? · · Score: 1

    For those of you interested, I'm part of a group developing checkpoint/restart for Linux. We're fairly early off in the project, but we're going to be adding this feature to Linux fairly soon. (Hoping to have a patch/module release out in May.)

    We're putting two features in: Checkpoint/Restart and Suspend/Resume. Checkpoint/Restart allows you to save a running session or process to disk, and restart it sometime later, on a different node, or after a system reboot. Suspend/Resume does more or less the same thing, but keeps the process data structures in the kernel, without writing them to disk. S/Resume won't work through a reboot, but it's useful for certain applications. You can think of it as a combination of swapping the process to disk and hitting ^Z to nab the process.

    We're putting in some signalling mechanisms, to allow the process to catch the checkpoint, restart and continue signals. We're also going in and adding some code to capture data in pipes and FIFOs. It'll work with multi-threaded processes, and full UNIX sessions (so you could checkpoint, say, a login shell and e-mail it to all of your friends. :)

    Our checkpoint/restart is meant for scientific applications, but should work on just about anything else. We're going to spend this summer hanging out with the LAM crew to make it work with MPI applications correctly.

    For those of you looking for something to download, I'm sorry I can't post a working link right now, or any code. We just got past our requirements document, and we're putting the design spec's together now. The req's doc't is due to be published next month, an implementation survey is coming out in March. If you're interested in having a look at those, drop me a line, and I'll let you know when they're available.

    - ERoman at (no spam) lbl dot gov