Slashdot Mirror


UNIX Process Cryogenics?

shawarma asks: "Due to a recent power outage, I've had to shut down a server running a process that had been running for ages calculating something. The job it was doing would have been done in a few days, I think, but I had to shut it down before the UPS ran out of juice. This got me thinking: Why can't I freeze down the process and thaw it back up at a later time? It ought to be possible to take all the connected memory pages and save them in some way, preserve file handles and pointers, and everything. Maybe net-connections would die, but that's understandable. Has any work been done in this field? If not, shouldn't there be? I'd like to contribute in some way, but I think it's a bit over my head.." Laptops have been doing this in some form for years: most laptops, when they run out of power, or when told by the user will go into "suspend" mode which is similar to what the poster is describing, however outside of laptops, I haven't seen this done. Sleeping processes also do something similar, sending their memory pages into swap so other running processes can use the memory. What, if anything, is preventing someone from taking this a step further?

555 comments

  1. Use Windows XP by Red+Avenger · · Score: 1, Informative

    Windows XP has 2 features that are improved over Windows 2000 they are called suspend and hibernate. Suspend is a low power standby mode but it keeps all of your applications up and running when you come back. Hibernate actually saves everything to disk and shuts the computer all the way off. When you come back everything you are working on is there.

    Good luck.

    1. Re:Use Windows XP by mindstrm · · Score: 1, Informative

      These features are far from new in XP.
      My old win95 laptop could suspend/hibernate.
      My old DOS laptop before that could as well.

    2. Re:Use Windows XP by mindstrm · · Score: 2

      They do exist in other systems, or at least, they work on other systems.

      My laptop has no problems suspending/hibernating linux.

      The question here is about process hibernation, not the whole box.

    3. Re:Use Windows XP by GeckoX · · Score: 1

      And just who said we're only talking about *nix here?
      Who's the moron?

      I believe you made an -umption

      --
      No Comment.
    4. Re:Use Windows XP by Ewan · · Score: 2, Funny

      The difference is that suspending a laptop is done using hardware, but the suspend mode in WindowsXP is done in software, so desktop PCs can do it without additional functionality.

      Ewan

    5. Re:Use Windows XP by Lukey+Boy · · Score: 2

      He's talking about on a process-level, as in freeze a lengthy game of Asteroids and restore it later. Hibernation is system-wide, not on a process-by-process basis. And Linux has that too ;-) Note: this comment is reused!

    6. Re:Use Windows XP by GeckoX · · Score: 1

      Doh...

      Supposed to read:

      [ass]-umption[/ass]

      --
      No Comment.
    7. Re:Use Windows XP by LadyLucky · · Score: 1

      Windows 2000 can certainy do it in software. I did it to my desktop just to prove it. It was a little flaky, and it left a god awful .sys file on the C: that took me ages to work out how to kill, but it did work.

      --
      dominionrd.blogspot.com - Restaurants on
    8. Re:Use Windows XP by Anonymous Coward · · Score: 0
      And just who said we're only talking about *nix here? Who's the moron?

      *sigh* I hate responding to obvious trolls, but...

      THE ARTICLE DID

    9. Re:Use Windows XP by Anonymous Coward · · Score: 0

      What's the title of the story?

    10. Re:Use Windows XP by Anonymous Coward · · Score: 0

      I said so, .... moron

    11. Re:Use Windows XP by fliplap · · Score: 1

      There's been software around since windows 95 todo hibernate on desktops. I had an IBM desktop years ago that came with Hibernate support

    12. Re:Use Windows XP by Anonymous Coward · · Score: 0
      (From the original Anonymous Coward poster)

      Did you even read the article TITLE? "UNIX PROCESS CRYOGENICS".

      Who's the moron?

      I believe you've answered that for all of us.

      Moron.
    13. Re:Use Windows XP by rlowe69 · · Score: 4, Redundant

      This comment is far from (Score:4, Informative) ... it's not even relevant. We're not talking about the whole OS hibernating, we're talking about saving the execution state of an executing process so that it can be resurrected later and continued (ie. if a reboot is necessary).

      --
      ----- rL
    14. Re:Use Windows XP by sulli · · Score: 1, Flamebait

      Perhaps it actually works in XP? My 98 laptop suspends without difficulty, but it BSODs about half the time on wake-up - not fun.

      --

      sulli
      RTFJ.
    15. Re:Use Windows XP by Anonymous Coward · · Score: 0

      Speaking of morons...

    16. Re:Use Windows XP by Anonymous Coward · · Score: 0

      It would achieve the same purpose, right? I run Windows 2000 at home and I use 'hibernate' a lot. Saves boot time (the whole thing, with HTTP, FTP and SMTP/POP3 servers loads in under 20 seconds) and lets you save the state of any program.

    17. Re:Use Windows XP by Xawen · · Score: 1

      It is semi-relevant given the situation. He had a limited time to save the process state before shutting down the system. Sending the computer into hibernate mode would have achieved both these tasks with relative ease.

    18. Re:Use Windows XP by Anonymous Coward · · Score: 0

      Uhh...mods? How is this flamebait when it's true?

    19. Re:Use Windows XP by Anonymous Coward · · Score: 0

      The moderators are on crack...funny???

    20. Re:Use Windows XP by Anonymous Coward · · Score: 0

      Please enlighten me as to how a simple factual statement such as the parent warrants a Troll???

      CRACKWHORE MODERATORS

    21. Re:Use Windows XP by Red+Avenger · · Score: 1

      Actually I think hibernate or suspend would work quite nicely for this chap. His whole machine was going down not just the process. Why wouldn't you want to save the state of the whole machine when power failure is imminent?

    22. Re:Use Windows XP by Red+Avenger · · Score: 1

      Actually the situation this guy describes is the whole machine is going to have to go down due to power loss. Why wouldn't you want to save the state of the whole machine?

    23. Re:Use Windows XP by Anonymous Coward · · Score: 0

      I agree. And these are never the moderations that come up for meta-moderation, so we can't even spank the moderators.

    24. Re:Use Windows XP by JSmooth · · Score: 0

      No wonder you are an anonymous coward. Some one shares a valid answer to a asked question and you call them a moron? Shame on you.

      I agree the MS version of hibernate can be effective for a full OS freeze but I would also look at how SQL transactions occur. the idea of a transactionlog and update of the DB is what you are looking for. This would not only give you the ability to start/stop at will but would even recover in the event of some type of corruption or system failure.

      don't reinvent the wheel, use it again.

    25. Re:Use Windows XP by Archie+Steel · · Score: 1

      You can also suspend the whole system with just about any OS that runs under VMware (Windows 9X/NT/2k, Linux, etc.) - though it seems that when I do it more than twice in a row it gets buggy...

      As far as suspending processes, I don't think Windows can actually do that, and I haven't heard about other OSes who can actually do that...

      --

      Reminder: find a new sig
    26. Re:Use Windows XP by Ewan · · Score: 1, Offtopic

      I did wonder, i also got an overrated and an underrated :)

    27. Re:Use Windows XP by sklib · · Score: 1

      It's not possible to hibernate a single process.

      During thawing, to restore the process's memory structure, one would have to do one of two things: Either put the process *exactly* where it was before in system memory, which may not be possible because other programs (perhaps even the OS?) are running in that memory space now.
      The other option is to reallocate new memory for the process, and then go through and fix every pointer in the process to point to new memory locations. I will remind you that this is not possible, because processes can do very strange things with pointers and it's not possible to keep track of all of them from the object code side.
      Now, if the process could hibernate itself... well that's the same as hitting Save, and Exit in any program.

      So the only problem here is that programs that take weeks and/or months to compute stuff need to be written in such a way that you can save every once in a while, so when the power DOES go out, you don't lose that much of what you've processed.

      In my opinion OS-level hibernation (which already exists for many windows versions, and seems like it should exist for those big mainframes) coupled with some smart programming (no intractable problems here) would put a thorough end to these shenanigans with losing months of processing time just because the power went out 5 minutes before it finished.

      --
      -S
    28. Re:Use Windows XP by Anonymous Coward · · Score: 1, Interesting
      Actually, modern VM makes this "hard part" completely trivial. Each process has it's own address space. The only possible foul up would be shared library mappings, but I suspect that's easy to fix.

      Also, smart programming is not a valid requirement. Much critical long running code is written by noncomputer people, e.g. physicists.

    29. Re:Use Windows XP by taliver · · Score: 2, Insightful

      It's not possible to hibernate a single process.

      Wow, so the fact that its been done here is just a red herring?

      Does Virtual Memory mean anything to you?

      --

      I demand a million helicopters and a DOLLAR!

    30. Re:Use Windows XP by cscx · · Score: 1

      That file (hiberfil.sys on XP; it's there by default) is used to dump your memory to disk. It's proportional to the amount of RAM you have in your system.

    31. Re:Use Windows XP by sulli · · Score: 0, Troll

      Truth is always flamebait on slashdot ;-)

      --

      sulli
      RTFJ.
    32. Re:Use Windows XP by AndyChrist · · Score: 1

      Well, you're seeing people going nuts here, because of linux/unix/mac zealots seeing someone actually saying something good about, even RECCOMENDING a Microsoft OS.

    33. Re:Use Windows XP by innocent_white_lamb · · Score: 1

      Why wouldn't you want to save the state of the whole machine?

      Because, as a general-purpose thing, it would be handy to be able to freeze a process and bring it back later on occasion and not just when the power is going off.

      --
      If you're a zombie and you know it, bite your friend!
    34. Re:Use Windows XP by rlowe69 · · Score: 2
      Why wouldn't you want to save the state of the whole machine when power failure is imminent?

      Because that's not what he asked. He asked, and I quote:

      Why can't I freeze down the process and thaw it back up at a later time? It ought to be possible to take all the connected memory pages and save them in some way, preserve file handles and pointers, and everything.


      This has different implications. Let's say that you have to turn off your system to replace a noisy fan, but you have a process going that could take a few days (a render farm or cluster is place this might happen in). You'd like to pause it and then resume it once the computer is back on. In order to do that, you'd have to save EVERY piece of information associated with the running process like memory used, files, etc. THIS is what the guy is talking about, not hibernating the whole computer (which, if the computer is running many processes could be an extremely bad use of hard disk space, not to mention time consuming - time is something you don't have when running off a UPS).

      Cliff's only made the situation worse by saying "Laptops have been doing this in some form for years", but really "in some form" is a generalized stretch. It seems to me that its likely that its much more complicated to save and restore one specific process than it is to save and restore all of them in one big dump back into memory when the system recovers.
      --
      ----- rL
    35. Re:Use Windows XP by IsaacW · · Score: 1

      Actually, most any process running on a heavily loaded computer is "hibernated" potentially many times per second, during the course of normal context switches by a preemptive multitasking kernel. Though the process data is not written to disk, there is a certain amount of time during which the processor knows absoulutely nothing about any process that is not currently running. All of this is managed during the call for a context switch, and it would be relatively simple (as simple as kernel programming gets, anyway) to make a call that would stop a process and write all the relevent data out to a file on disk, which would be able to be read and re-inserted into the process table without incident.

      A more interesting note with this would be security. What would happen if someone froze a process, altered the dump file, then restarted it to possibly gain privileges beyond their normal user class? Obviously, such a freeze/restore call should only be allowed by trusted users, and even then perhaps some kind of checksumming should be done to identify that the process was not tampered with during its downtime.

    36. Re:Use Windows XP by sysop0130 · · Score: 0

      Nothing works right on XP...

      --
      -------
      "People who do not break things first will never learn to create anything." -Philippine Proverb
    37. Re:Use Windows XP by Anonymous Coward · · Score: 0

      If you are using Windoze family products - you are already THE moron. I like to see all of Windoze just go into the limbo.

    38. Re:Use Windows XP by dave256 · · Score: 1

      This was included in WinME as well as XP and 2000 (with service packs).

      It doesn't work. Specifically, all the applications involved have to play nice with the OS when it says 'Stop what you're doing.' Norton is especially pissy.

    39. Re:Use Windows XP by Red+Avenger · · Score: 1

      Actually it works great for me on Windows XP. I think the feature is much improved in XP. I don't run Norton so no problems here. . .

    40. Re:Use Windows XP by Anonymous Coward · · Score: 0

      Or OS/2 Warp 4- it has the hibernate feature.

    41. Re:Use Windows XP by Anonymous Coward · · Score: 0

      Fuck Windoze xp. It says "unix process cryogenics, doesn't it?"

  2. Hibernation? by GeckoX · · Score: 0, Offtopic

    Is what you're asking for not just hibernation?

    Fully available on Win2k/XP etc, works just a treat.

    No idea of anything comparable elsewhere, i.e.: for linux but the concept is neither new or unheard of.

    --
    No Comment.
    1. Re:Hibernation? by Lukey+Boy · · Score: 1

      He's talking about on a process-level, as in freeze a lengthy game of Asteroids and restore it later. Hibernation is system-wide, not on a process-by-process basis. And Linux has that too ;-)

    2. Re:Hibernation? by Proteus+Child · · Score: 1
      No idea of anything comparable elsewhere, i.e.: for linux but the concept is neither new or unheard of.

      I've just tried it on my laptop (a Dell Latitude CP (model M233SD)) running Slackware Linux v8.0 (kernel revision 2.4.17). Nothing fancy, no unusual hardware, just a PCMCIA network card. I closed the lid while it was running and the system (presumably) went into suspend mode: The hard drive spun down, the display turned itself off (no glow could be seen around the edges of the lid), the .mp3 stopped playing, and a soft 'tweet!' could be heard from the speaker. Then I started typing this. I just opened the lid back up, and the drive spun back up, the display kicked back on, and after a snapping sound the .mp3 started playing again.

      Looks like it worked.

      --

      Proteus' Child

      Doko ni datte; hito wa, tsunagette iru.

    3. Re:Hibernation? by Anonymous Coward · · Score: 0

      Use java, and serialize a runnable to disk.

    4. Re:Hibernation? by Anonymous Coward · · Score: 1

      The example was of the power going out, in which case you might as well hibernate the whole system.

    5. Re:Hibernation? by -douggy · · Score: 2

      Acer Travel mates (well my 312T) do the same thing in SUSE linux. If you shut the case it goes into suspend mode (function key +f3) to hybernate fully i needed to leave a fat32 40MB partition to dump the ram to as the bios didn;t seem to want to dump to the linux parition.

      Obviously i dont use the laptop for large numerical simulations but i just tested it with a fortan numbercruncher program running and it woke up fine

  3. the mode you are speaking of by Stone+Rhino · · Score: 2, Informative

    is not suspend, it is hibernate. Suspend will power down the computer except for the energy needed to keep the ram alive. hibernate will save all data to from memory to disk. I, personally, use neither on my laptop.

    --


    Remember, there were no nuclear weapons before women were allowed to vote.
    1. Re:the mode you are speaking of by Timbo · · Score: 2, Informative

      What you refer to as suspend is what most people (and APM) call standby. What you call hibernate is what APM refers to as suspend. I believe Windows uses the term hibernate to refer to a software suspend function.

    2. Re:the mode you are speaking of by CmdrPinkTaco · · Score: 1

      Ok, Im going to show my (extreme) ignorance here, but what is the difference between what the Ask Slashdot is asking for and a Journaling File System? I don't claim to know anything about JFS, so any insight to this would be great. Thanks in advance.

      --
      Please give your mod points to others, Im at the cap. They will appreciate it more
    3. Re:the mode you are speaking of by Anonymous Coward · · Score: 0

      JFS - disk integrety is guaranteed even if you pull the plug during a disk write.

      Writing new data to a disk is a multi-step process
      1) Allocate a free block
      2) write data to the block
      3) update the file's length.

      Consider what happens if the plug is pulled between those before all 3 are done

      Maybe the blocks will be unused, but marked as in use. Maybe the length of the file will be less than the contents truly are. Maybe the lenth of the file will be greater than the contents truly are.

      The Ask slashdot question is asking about suspending a process, dumping it's memory to disk, and being able to restart it any time in the future. Windows does this quite fine with supported hardware (all memory is dumped to disk, windows shuts down. At start up time, Windows just overwrites it's ram with the contents of the file and starts up again).

    4. Re:the mode you are speaking of by Anonymous Coward · · Score: 0

      Hibernate is really only true hibernate if I can
      take down the process on one machine and bring it
      up on another machine. This is important for compute farms where different groups have
      priority on machines, need instant response
      and others can't afford for a 7 day simulation to
      be killed after 6 days. _> hibernate the process
      and move it to a different machine.
      No renice does not work and just letting it swap out when there is other hardware available does not help either.
      You could use it to do hardware upgrades/fixes without impacting operations (except for the time to dump/load)

    5. Re:the mode you are speaking of by stealthv · · Score: 1

      A JFS deals only with the file system. More specifically the file system's meta data (allocated blocks and such). It protects against the file system from being corrupted if the computer is suddenly shut down.
      What this person wants is the ability to save the machine's current 'state'. This would include everything in RAM, the cache, the process stack etc. That way when the computer is rebooted the computer could be restored to the state it was in before shutdown. Running programs and all.

    6. Re:the mode you are speaking of by SilentChris · · Score: 2, Informative

      *Sigh*. More people with very little experience with laptops. Read the mini-faq, people.

    7. Re:the mode you are speaking of by CmdrPinkTaco · · Score: 1

      Thank you - I was not clear on what a JFS did, thus the question. I am a programmer, but tend not to get my hands dirty in OS level type involvements, thus I didn't really grasp the concept of a JFS.

      --
      Please give your mod points to others, Im at the cap. They will appreciate it more
  4. Really worth the effort? by NetJunkie · · Score: 1, Offtopic

    How often is this a problem? If it happens a lot fix the power problem...not the problems after.

    I don't see it worth the time and effort to set something like this up.

    1. Re:Really worth the effort? by b_pretender · · Score: 4, Insightful
      Good point. He should also create numerical algorithms with log files that keep track of how far they are getting and track results.

      This sounds like common sense to me. You never know when the disk is going to poop, the power shut off, the network reset.

      At my old job, we were required to record the status of all jobs that took longer than an hour (on a 6 cpu SGI). They never crashed on their own, but I would usually interrupt them if the requirements changed or whatever. If they ever did crash, then there was a record of exactly where they left off.

    2. Re:Really worth the effort? by Anonymous Coward · · Score: 0

      How insightful of you! You yourself don't have a use for a feature so you get pissy about it. This can definitely be an essential ability for researchers and others doing extensive number crunching. Sure, we don't give a fuck if you lose your Mozilla browsing session. But if you lose important work, it's a big deal. And it makes no sense for every programmer to create the feature in every application he ever writes. It should be an operating system capability.

    3. Re:Really worth the effort? by Anonymous Coward · · Score: 0

      Power problems aren't the problem. See the condor system posted above. Processing freezing allows for a totally cool new paradigm!

    4. Re:Really worth the effort? by kdawg6000 · · Score: 3, Informative

      If you are a grad student who has been waiting for a month for a job to finish...this could be very important. I was in an engineering department where jobs that ran for weeks were not uncommon (fortunately most of mine only took a day or two). A shutdown of a critical machine could set someone back months.

    5. Re:Really worth the effort? by NetJunkie · · Score: 3

      No, I wouldn't design a totally new memory dump system, I'd keep logs. Have the app keep track of where it is so that should the system restart it can pick back up again. That could be done without new BIOS and memory systems.... And you could do it TODAY with your existing hardware setup.

    6. Re:Really worth the effort? by Airline_Sickness_Bag · · Score: 1

      Any programs we use that take more than a few hours write out restart files every 1-2 hours, so we can restart the job if the machine dies. It also helps to have restart files so you can debug a program that ran for a week and died - you can restart it with debugging on, and repeat the last few iterations.

      Some of us run programs for 1-2 months. My longest was about two weeks, though.

    7. Re:Really worth the effort? by sketerpot · · Score: 2
      Why not do some work once and save all the application developers a lot of work? This is a good idea.

      This could be done without doing anything to your BIOS; youc could just dump all the memory allocated to a certain program to disk and put that process in a list of hibernating processes. What's so hard about that?

    8. Re:Really worth the effort? by uchian · · Score: 2

      Of course, if your running some job which could take a month to finish... you code it so that it can pick up where it left off, or at least where it will only have lost a couple of hours-worth of work at the most.

      Or is that too sensible?

      (and if it's a proprietary package and it can't pick up from where it left off, find a different one).

    9. Re:Really worth the effort? by Vilmos · · Score: 1

      > I don't see it worth the time and effort to set something like this up.

      And what if you play a game and there is no way to save the state of it? :-)))

      Vilmos

    10. Re:Really worth the effort? by harlows_monkeys · · Score: 3, Insightful

      There are more than power problems to worry about with a long running process. There are other hardware failures, scheduled downtime, and system crashes to contend with. Just becuase in this instance it was a power failure that made him wish he had this ability doesn't mean it wouldn't be useful in other circumstances.

    11. Re:Really worth the effort? by TheCarp · · Score: 1

      How about because its a better solution?

      Ok... "freezing" a process before shutdown is a fine idea...actually... to be really really evil, you could say its been done... look at the emacs compile closely if you don't believe me. There is a utility in existance that will take a core dump and turn it into an executable! EVIL MAGIC I SAY!

      However, it only solves part of the problem. You have to be able to tell the system to freeze the process before it goes down. What happens if some sysadmin inputs the wrong PID to a kill -9 ? or If someone trips over the power cord, or a kernel bug (or particularly nasty disk or disk subsystem lossage) brings the machine down hard BEFORE you can freeze your process?

      Many possibilities in many scenarios.

      End result: Log files work better - in fact - its what most all of the commercial software packages that do long runnoing computations (gaussian just to name one) use them.

      -Steve

      --
      "I opened my eyes, and everything went dark again"
    12. Re:Really worth the effort? by abombss · · Score: 1

      This sounds like common sense to me.

      I once had a little cracker program for cracking winzip passwords, even this had the ability to shutdown a brute force attack and resume it at a later date assuming the log file was kept. I would think any software that required long periods of time to run should build this in.

      Even if the OS had some capability to suspend, there is still no gurantee that the program itself would come back up without a glitch.

      --
      "Always give your best, never get discouraged, never be petty..."
    13. Re:Really worth the effort? by Anonymous Coward · · Score: 0

      Of course it is! I think this would be a good thing to have in everyday life, no only in emergency situations. In practice it means, you can shut down your machine, turn it on at a later time and go on with work exactly where you have stopped, without having to bother about saving files, etc...
      I think this is a must feature. Now when you have to leave your workplace in a hurry you don't need to worry anymore. Just shut it down and resume work exactly where you stopped at a later time!

      Best regards...

    14. Re:Really worth the effort? by CityZen · · Score: 1

      And what do you do when the program is so obscure that you can't just "find a different one"? Write a new one from scratch?

      Seriously now, the problem is not just programs that run a month. Even interrupting a program that runs for a few hours can be a critical deal when you're working on a big deadline.

    15. Re:Really worth the effort? by uchian · · Score: 2

      And what do you do when the program is so obscure that you can't just "find a different one"? Write a new one from scratch?

      Well, if it just lost you a months worth of work, then your in exactly the right frame of mood to go out and do so!

      And of course, if it was open source you wouldn't have to write it from scratch...

      But seriously, if a program is so mission critical (or deadline critical) that it is important not to lose a months worth of work, and the software has no safeguards to prevent this from happening, and if you can't add any yourself AND you go ahead and use the software anyway... well your a fool and deserve everything you get.

      Or at the least, learn a nice important lesson. And then go and rewrite the software.

      And the same deal works no matter what the timescale. If the software isn't up to scratch, then get or make some that is.

  5. WinXP does this on my laptop by jon_c · · Score: 0, Offtopic

    But its called hibernation. Bassicly all the processes are suspeneded then the system memory is copied to disk. The tricky part is getting the devices to hibernate. The way MS handles it is that all the active devices have to support the hibernation calls or the entire system won't hibernate.

    I'm sure other OS's have this too, i wouldn't be suprised if someone has done it with linux.

    -Jon

    --
    this is my sig.
  6. DIdn't that exist in VMS? by OSgod · · Score: 1

    If memory serves me right we used to freeze, backup, thaw and go on with life.

    1. Re:DIdn't that exist in VMS? by Anonymous Coward · · Score: 0

      VMS developers went to Microsoft. Maybe that's why it's part of MS Operating Systems now.

    2. Re:DIdn't that exist in VMS? by Anonymous Coward · · Score: 0

      VMS, MVS, MCP/VS, SGI Irix; some form of process
      checkpoint has existed in mainframe and mainframe
      quality operating systems since the 60's. There
      are of course difficulties with certain things;
      imagine a checkpoint of a process with open
      network endpoints - you now have to checkpoint
      the related process(es) on the other end of the
      network checkpoints as well. This can get quite
      complicated. However, for long-running
      computational processes, checkpoint-restart capabilities are quite useful (which is why you'll
      find them in SGI IRIX and Cray operating systems).

  7. Saving application state by cheezehead · · Score: 2, Insightful

    Of course, you could write your application so that it saves state at regular intervals (aka checkpointing). Especially with calculations you should be able to store intermediate results.

    --

    MSN 8: Now Microsoft even has bugs in their ad campaigns.

    1. Re:Saving application state by Schmerd · · Score: 1

      This reminds me of the situation that I came across while working in the computer labs in college. After people lost their paper a few times, they learned to click Save every so often. That's much easier than some fancy hybernation system people are talking about.

    2. Re:Saving application state by FortKnox · · Score: 1

      True... but... hibernation works across all applications, while autosaving only helps one application...

      --
      Good quote, too many chars. Seriously, the slashdot 120 char limit sucks!
    3. Re:Saving application state by joshsisk · · Score: 1

      I think the original poster was referring to a process which takes weeks to calculate, such as a complicated 3D render, or a statistical analysis. Not a term paper.

      You can't just hit "save" halfway through in either of these situations, because the application is busy.

    4. Re:Saving application state by Schmerd · · Score: 1

      Sure you can, if you fix the application. Complex calculations are not just a single operation, they're a series of smaller operations that are strung together resulting in a more comples argorithm. What I'm suggesting, is to alter the application so that it saves checkpoints along the way.

    5. Re:Saving application state by charon_on_acheron · · Score: 1

      Nevermind, we found a better solution.
      We upgraded our 80286 server with 1MB of RAM to a Pentium 4 with 1GB of RAM.
      We ran the computation again again, and it finished in about 5 seconds.

    6. Re:Saving application state by joshsisk · · Score: 1

      Yeah, but the poster is talking about a solution for the end user, not the developer of the original software product(s). If the software he's using had that built in, he wouldn't have posed the question.

    7. Re:Saving application state by joshsisk · · Score: 1

      You think you can render broadcast (or film) quality CG scenes in 5 seconds on a Pentium 4?

      Think again.

    8. Re:Saving application state by Binestar · · Score: 2

      Seems to me this guy has figured out that as well. Only he only lost his "paper" once, and wants a solution to "save" it while working on it.

      Hybernation just happens to be the version of saving he wants to implement.

      --
      Do you Gentoo!?
    9. Re:Saving application state by charon_on_acheron · · Score: 1

      First, you must have missed the fact that I was making a joke.

      Second, I never said anything about computer generated scenes.

      Third if a Pentium 4 can't render a 1-second long cg scene within 5 seconds, that would make for a very long movie. "The second frame of the film was beautiful, but I missed the third frame because I had to use the bathroom, because of that large soft drink I had during the rendering of the first frame." I guess I'm reading it wrong, because it seems you should be able to render a 5 second scene in 5 seconds. But that's not my point.

      Fourth, my point was to make a joke.

      Fifth, the joke was, that the solution was simply to replace a verrrrrry slow system with a much faster new system. Obviously, no one would post the question that shawarma did, on Slashdot, if they were using a 286 PC as a server. But if someone is using a 286 PC as a server, and a process takes several weeks to complete, and they upgraded to a Pentium 4 class system, the same process would be done very soon.

      (God, does that sound like something from "My Cousin Vinny"?)

    10. Re:Saving application state by Anonymous Coward · · Score: 0

      if you are writing the program, you could use signal handling - the user can tell your program to stop by simply using the kill command to send some signal to your process, and the handler will take care of saving the temporary results to the disk. not too hard to organize it this way, i presume. still, a "suspend process" feature would be nice, but i doubt it can be realized for a single process - processes often communicate with one another using mailboxes, pipes, parent processes wait for their children to die, etc - and yet a "syspend system" feature would not be too difficult to implement, i hope :)

    11. Re:Saving application state by joshsisk · · Score: 1

      if a Pentium 4 can't render a 1-second long cg scene within 5 seconds, that would make for a very long movie. "The second frame of the film was beautiful, but I missed the third frame because I had to use the bathroom, because of that large soft drink I had during the rendering of the first frame." I guess I'm reading it wrong, because it seems you should be able to render a 5 second scene in 5 seconds.

      It can take a computer hours to render a CG scene-How long do you think it took to render "the Spirits Within", the same as length of the movie? Please. For the really big stuff, it takes a render farm.

      I'm sure the same holds true for heavy math calculations for science applications, though I don't do that sort of thing myself.

      I think you think I'm talking about in-game cut scenes, or something. I'm not. I'm talking about rendering CG for output to film or video. I know I'd like to be able to "freeze" a render, or set a machine to just work on rendering when I'm not at the computer... I'd imagine high-end set ups can already do this, but I don't have access to a render farm.

    12. Re:Saving application state by charon_on_acheron · · Score: 1

      Sorry for previous post's attitude. It's just you took the post way to seriously, since it was just a joke. And I honestly have no idea how long it takes to render a scene using CG. I was looking at the concept more from a consumer point of view. "Well my DVD player renders the scene as the movie plays, so that's a 1 for 1 time ratio." My only knowledge of rendering that stuff is from watching "The Maing Of..." type shows. And they don't get into much detail.

  8. Not recommended by iiii · · Score: 1, Funny

    I think that would exceed the recommended operating temperatures for your hardware. But on the up side, we might see the head (?) of your box on Futurama.. ;-)

    --
    Light cup, beer drink, thin so chain, neck turtle fat, man I won't say it again
    1. Re:Not recommended by Anonymous Coward · · Score: 0

      /. moderators bite moose - this is at least mildly funny.

  9. External dependancies by interiot · · Score: 3, Insightful

    External dependancies might include open files (what if you freeze, and then delete the file?), open TCP sockets to daemons elsewhere that wouldn't get frozen, sub processes, etc... These would probably have to be revived, but how?

  10. We do it in Condor by epaulson · · Score: 5, Informative

    http://www.cs.wisc.edu/condor/

    Free-as-in-beer, on most major UNIX platforms. Check out our publications, we have several that give all the details you'd need to write it yourself.

    Plenty of others, too - libckpt, there was a "Checkpointing Threaded Programs" paper at USENIX this past summer... there are some kernel patches that can do, most of them under the GPL.

    1. Re:We do it in Condor by Anonymous Coward · · Score: 0

      Free as in beer is a fine first step. I wonder why you don't release the source though? I bet tons of people would use a system like yours. I looks liks it's been around for, what, a decade or so? It doesn't look like it's the profit motive preventing you from doing so. So, let us help! :-)

    2. Re:We do it in Condor by dsouth · · Score: 5, Informative

      As the poster said, there are plenty of others:

      • SGI IRIX and Cray UNICOS provide kernel-level checkpoint-restart.
      • Condor provides user-level checkpoint restart and process migration by manipulating libraries at runtime.
      • esky provides user-level checkpoint restart under Solaris and Linux via runtime library manipulation.
      • crak provides kernel-level checkpoint restart for linux.
      • cocheck provides user-level checkpoint-restart.
      • libckpt provides user-level checkpoint-restart.


      I'm sure I left serveral out. Checkpoint-restart has been part of the high-performance computing scene for years. Having been a systdmin on large, high-performance, computing platforms for the last few years of my professional life, my experiences with checkpoint-restart have been a mixed bag. All of the existing systems have limitations. Depending on the application, those limitations can be no problem, or they can be deal-breakers.
    3. Re:We do it in Condor by Anonymous Coward · · Score: 2, Funny
      The label "free as in beer" is misleading, due to the cultural differences between Wisconsin and other parts of the world.

      The people of Wisconsin are fat, stupid, drunken oafs. They consider themselves "America's Dairyland", although this title was taken from them many years ago by the state of California. This is not the only false claim to fame that the state keeps. Green Bay Packers fans consider their city to be "Titletown, U.S.A.", because of the numerous NFL championships the team has won. The numbers may seem impressive, but the majority of them were won when the NFL was a small league, and there was no playoff for the championship.

      Getting back to the fatness, they do produce and consume a lot of dairy, but this is not why they are called "cheeseheads". A little known fact is that most of Wisconsin's citizens are inbred, and even those that aren't inbred frequently suffer birth defects, due to maternal alcoholism. This results in a condition that produces small holes in the skull, where fluids escape and eventually congeal into small, yellow lumps, hence the term "cheesehead". Hence, the traditional Packer "cheesehead hat" is actually a symbol of Wisconsin's perseverance in the face of a world that looks down upon inbreeding.

      Getting to the point, Wisconsinites _crave_ beer to feed their alcoholism, so much so that beer is an extremely valuable commodity, despite the abundance of breweries throughout the state. In fact, the Leinenkugel's brewery of Chippewa Falls goes so far as to indicate the value of its beer on the label of their original lager -- "Leinie's Original" is "Good as Gold".

      So you see, we still haven't found an English word or phrase quite as good as "libre" -- "free as in beer" can be just as ambiguous as the word "free" is by itself.

    4. Re:We do it in Condor by Anonymous Coward · · Score: 0

      Yo, moderator!

      mod this guy up! what are you waiting for???

    5. Re:We do it in Condor by servanya · · Score: 1

      if I was a mod:

      -5 retard / who the fuck cares!?!

    6. Re:We do it in Condor by Anonymous Coward · · Score: 0

      You forgot EROS
      http://www.eros-os.org/

  11. OS X needs this especially by kilgore_47 · · Score: 5, Interesting

    for the "Classic" environment. It seems so stupid watching macos9 boot up in a window when you want to use a classic program; Apple ought to save the state of the classic environment in to a file that could be quickly reloaded into ram when classic is called for. As the blurb said, laptops have had the suspend feature for years; would it really be so hard to apply the same concept elsewhere?

    --
    ___
    The way to see by faith is to shut the eye of reason. --Ben Franklin
    1. Re:OS X needs this especially by kilgore_47 · · Score: 1

      sorry, thats the hibernate feature (not suspend)

      --
      ___
      The way to see by faith is to shut the eye of reason. --Ben Franklin
    2. Re:OS X needs this especially by medcalf · · Score: 2

      Well, OS X certainly can sleep (both OS X and Classic go to sleep), putting to sleep also all processes. As to hibernating the Classic environment, I don't know how useful that would really be in the long run.

      --
      -- Two men say they're Jesus. One of them must be wrong. - Dire Straits
    3. Re:OS X needs this especially by Masker · · Score: 2

      Errrr... Without protected memory spaces, I _don't_ think that this is what you want. You'd actually be setting yourself up for more problems. You don't want to save the system's memory state unless you can be sure that it's relatively clean & safe...

      --

      ---------The early bird gets the worm, but the second mouse gets the cheese.

    4. Re:OS X needs this especially by iso · · Score: 2

      I think what he means is save the clean boot-up state of the classic environment (provided nothing has changed in the System folder since the last boot of classic). That way when classic needs to boot, OS X could just throw up a booted classic environment memory state in a matter of seconds instead of booting classic from scratch each time.

      - j

    5. Re:OS X needs this especially by medcalf · · Score: 2

      You'd have to define what you mean by "nothing has changed in the System folder", since prefs, for example, can change all the time. I suppose if you checked the image against the latest modification time of all files in the system folder, and threw away the image if the image was older than any file, it would work, but it seems that it could be pretty time consuming to do.

      --
      -- Two men say they're Jesus. One of them must be wrong. - Dire Straits
    6. Re:OS X needs this especially by Quixotic+Raindrop · · Score: 2, Interesting

      Which is funny, because VMware has exactly this capability.

      It needs some refinement, and sometimes it's slow when it picks back up again, but it generally works in my experience. It is obviously not only possible, but implementable using current technology

      --
      Only two things are infinite, the universe and human stupidity, and I'm not sure about the former. (Einstein)
    7. Re:OS X needs this especially by suwain_2 · · Score: 2
      Maybe I'm misunderstanding you, but if not... I just got an idea that's just a slight twist of yours.

      Why not just suspend the entire system to the hard drive...? The system could simply read the way your memory 'should be', and quickly copy it over. I don't have a lot of experience with how things boot, but this seems like a good idea to me...? It should be limited only by your hard drive's speed...?

      --
      ________________________________________________
      suwain_2 :: quality slashdot p
    8. Re:OS X needs this especially by Suppafly · · Score: 2

      Why not just figure out a better way to run old apps than to boot up basically the entire old os.. windows2000 can run dos and win3.1 win95 etc apps without loading the entire old kernel/os. Realistically, Apple could do the same if they spent a little bit more time on the problem.. but then you wouldn't have all the other cool advances to MacOS..

      OS X needs this especially for the "Classic" environment. It seems so stupid watching macos9 boot up in a window when you want to use a classic program; Apple ought to save the state of the classic environment in to a file that could be quickly reloaded into ram when classic is called for. As the blurb said, laptops have had the suspend feature for years; would it really be so hard to apply the same concept elsewhere?

    9. Re:OS X needs this especially by Anonymous Coward · · Score: 1, Insightful

      my personal preference is to not run Classic apps...I think Apple made a smart call saying "why work hard on something that will be useless in 2 years anyway?"

    10. Re:OS X needs this especially by Dajur · · Score: 1

      That would be cool. Kinda like VMWare does with its virtual machine OS's.

    11. Re:OS X needs this especially by Anonymous Coward · · Score: 1, Interesting

      No Apple could not do this even if they spent "a little more time on the problem," and they are already spending lots of time on the problem. Fundamentally, many of the ways old Mac programs are written can not work on a system where applications can be preempted and where applications memory can be swaped out at any point. To clarify, Windows 2000 can not run all DOS, Windows 3.1, and Windows 9x applications. That is part of the reason it took MS so long to transition to the NT kernel ... getting apps up to speed. For reasons beyond the scope of this thread, Apple is making a much faster transition than MS did.

    12. Re:OS X needs this especially by ncc74656 · · Score: 5, Interesting
      Well, OS X certainly can sleep (both OS X and Classic go to sleep), putting to sleep also all processes. As to hibernating the Classic environment, I don't know how useful that would really be in the long run.

      I don't know how directly comparable this example might be, but I used to use VMware (under Linux) to suspend Win98 when I didn't need it. If I needed to do something under Win98 (like browse the web), VMware would load up Win98 where I last left it. It saved the minute or so of waiting for the VM to POST and load Win98.

      (If VMware provided better support for DirectX, I might not have needed to switch my home workstation from Linux to Win2K. It's been more than a year since I checked, though, so things might've improved.)

      --
      20 January 2017: the End of an Error.
    13. Re:OS X needs this especially by passion · · Score: 2

      You'd have to define what you mean by "nothing has changed in the System folder", since prefs, for example, can change all the time.

      Preferences get written and re-written all the time. In fact, classic versions of Mac OS can be booted w/out anything in the Preferences folder. You drop a good point here, but this is a poor example. I see no reason for it not to work just fine, and would love to see Apple implement this.

      --
      - passion
    14. Re:OS X needs this especially by Dwonis · · Score: 2

      It was called suspend-to-disk until Microsoft called it hibernate.

    15. Re:OS X needs this especially by The+Raven · · Score: 2

      Without protected memory space? Maybe I'm misunderstanding your disagreement, but OSX *does* have protected memory. It is OS9 and prior that do not.

      --
      "I will trust Google to 'do no evil' until the founders no longer run it." Hello Alphabet.
    16. Re:OS X needs this especially by Anonymous Coward · · Score: 0

      Since Classic and OS X share the same filesystem and use many of the same files, it would be problematic to freeze the state of Classic, change files around while it's asleep, and then wake it up to an inconsistent filesystem.

    17. Re:OS X needs this especially by Lazaru5 · · Score: 2

      W2K can run older Windows apps because it's all still Win32. OSX is Unix/Carbon/Cocoa and can't just run old MacOS apps at will, hence the Classic OS boots.

      A closer comparison would be how FreeBSD supports Linux binaries via a "thunking" layer -- translating Linux syscalls to BSD syscalls. Again, this is fairly easy to do since it's still the Unix API. The old Trumpet 32bit Winsock did the same thing by translating 32bit calls to 16bit ones.

      Apple could probably have easily made a thunking layer that would at least run classic binaries using Mach syscalls, but drawing the windows themselves might not have been as easy to support natively.

      Then again, can't LinuxPPC and friends run classic apps? Or PPC BeOS?

      --

      --
      My comments and opinions completely reflect those of anyone and anything I am remotely associated with.
    18. Re:OS X needs this especially by Jace+of+Fuse! · · Score: 2

      IBM has been calling it Rapid-Resume for years.

      --

      "Everything you know is wrong. (And stupid.)"

      Moderation Totals: Wrong=2, Stupid=3, Total=5.
  12. Customer Demand by routerwhore · · Score: 1

    Sounds like a great feature that has actually been implemented on some platforms. But until it starts catching on as a trend and other people figure out its usefulness it won't reach the general masses unfortunately. Customer demand and survival of the fitness will dictate if someone picks up the ball and runs with the idea. Try settting up an advocacy website and mailing list to turn your works into actions.

  13. Shouldn't be difficult, I think... by FortKnox · · Score: 1

    I'd think (I used 'think'!) that if you had control UPS software that talked to the OS, and the OS itself (yay, linux!), it shouldn't be a hard process.

    The UPS control software says its running outta juice, the OS then saves all the memory to disk, and sets a flag, so on startup, it remaps all the memory back.

    Then again, I'm not a big assembly level programmer, so I'm sure its more complex than this...

    --
    Good quote, too many chars. Seriously, the slashdot 120 char limit sucks!
    1. Re:Shouldn't be difficult, I think... by EinYidden · · Score: 1

      VM/370 used to do this kind of thing (Saved Segments I think). You just save all memory to disk, have code in boot to reload and Voila. The trick might be to have a dummy entry in LILO or GRUB that boots a loader program to reload and restart. Not really that hard I think. ;-)

    2. Re:Shouldn't be difficult, I think... by Anonymous Coward · · Score: 0

      Maybe linux just needs another run level for hibernate modes. You don't need to shut down all services before the box goes into a freeze, you'll just want to shut down DB services and network services before you hibernate the machine.

  14. BeOS? by ScumBiker · · Score: 2

    I had Be installed for a while and I thought it would do that. I do know I never lost anything due to it crashing. Of course, it didn't crash much. I think using a journaled file system or at least soft-updates would be a good start. Frankly, I have no idea how to code something simlar to Win XP hibernate. Shouldn't be that hard though.

    --
    --- Think of it as evolution in action ---
    1. Re:BeOS? by Anonymous Coward · · Score: 0

      mod the above BeOS post down, offtopic/stupid

    2. Re:BeOS? by nigelo · · Score: 1

      >Frankly, I have no idea how to code something simlar to Win XP hibernate.
      >Shouldn't be that hard though.

      You have all the credentials for Vice-President of Estimating and Planning, then. Great post!

      --
      *Still* negative function...
  15. Search on "Checkpointing" by crow · · Score: 3, Redundant

    What you want is known as "checkpointing."

    There have been a number of projects that do this under Unix over the years. Many of them do it for the purpose of process migration. Others do it just for recovery.

    One such project that I used in the early 90s was Condor.

    The typical approach is to do something along the lines of forcing a core dump and then doing some magic to restart the process from the core file.

    1. Re:Search on "Checkpointing" by duplicate-nickname · · Score: 2, Informative

      The condor project is still alive and well: http://www.cs.wisc.edu/condor/ and should do what this guy wants to accomplish (but not what he's asking).

      --

      ÕÕ

    2. Re:Search on "Checkpointing" by Anonymous Coward · · Score: 0

      perhaps another solution would be a clustered environment like amoeba or VMS. If one node is going down, just transfer its processes to another one.

  16. emacs does it... by EdA · · Score: 1

    This has been done in GNU Emacs for years - at the process level. I used to use some commercial EDA (Unix) software which required some of the source from emacs (unexec.c rings a bell) with some modifications.

    1. Re:emacs does it... by Anonymous Coward · · Score: 1, Funny

      yea, well Vi has done it for 10's of years. :)

    2. Re:emacs does it... by EdA · · Score: 1

      /* Copyright (C) 1985,86,87,88,92,93,94 Free Software Foundation, Inc.

      This file is part of GNU Emacs.

      GNU Emacs is free software; you can redistribute it and/or modify
      it under the terms of the GNU General Public License as published by
      the Free Software Foundation; either version 2, or (at your option)
      any later version.

      GNU Emacs is distributed in the hope that it will be useful,
      but WITHOUT ANY WARRANTY; without even the implied warranty of
      MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
      GNU General Public License for more details.

      You should have received a copy of the GNU General Public License
      along with GNU Emacs; see the file COPYING. If not, write to
      the Free Software Foundation, Inc., 59 Temple Place - Suite 330,
      Boston, MA 02111-1307, USA. */

      /*
      * unexec.c - Convert a running program into an a.out file.
      *
      * Author: Spencer W. Thomas
      * Computer Science Dept.
      * University of Utah
      * Date: Tue Mar 2 1982
      * Modified heavily since then.
      *
      emacs-21.1/src/unexec.c line 1/1267 2%

    3. Re:emacs does it... by Anonymous Coward · · Score: 0

      I've never used emacs so I might be way off but doesn't emacs core dump it self upon exit and next time it starts it reads the dump into the core to speed up the start process?

  17. Hmm, VMWare can do this in a different way. by GeorgieBoy · · Score: 5, Interesting

    VMware suspends to disk. You can go as far as suspending the Virtual Machine, not Virtual Memory. Then copy the "data" files to another machine and resume the same suspended virtual machine like nothing ever happened, as long as the same basic hardware exists on the host system (e.g. NIC, sound, serial ports, etc).

    While this isn't quite what you are looking for, it spawn an idea of the level this can be taken to. Think of how neat it is for distributed applications. Of course, something like this has to exist somewhere. . .

  18. Connectix Virtual PC does this... by rekoil · · Score: 1

    IT's "quick start" feature saves the contents of RAM to disk, just like XP's Hibernate function. When you start it up, the system is just how you left it, apps and all.

  19. Extended core dump? by The+G · · Score: 5, Interesting

    Almost all of the stuff you need is already in a core dump. Perhaps the appropriate approach to this is to try to extend the core-dumping mechanism to also dump other pieces of state. Then you would just need a way to reconstruct process state from a core dump, which most runtime debuggers can almost do anyway.

    I suspect that all the pieces of a solution are written and it's just a tricky pick-choose-and-integrate problem.

    And damn but I'd love to have this ability.
    --G

    1. Re:Extended core dump? by Anonymous Coward · · Score: 0

      this might be a very good idea. why not use gdb load the core file and let the debugger finish the calculation?

    2. Re:Extended core dump? by cacheMan · · Score: 1

      It wouldn't be all that difficult to write a signal handler that could would catch a user defined signal (you can send a signal to a process using the kill command). When the signal is caught, dump core, do something silly like dereference a zero pointer in your signal handler. Then when you reload your application in gdb, jump over the offensive line where you dereference zero and return without killing the process (most signals will eventually exit, but not yours!). There is a big problem though. How long does it take to dump core? It can sometimes take a little while. Do you have enough disk space for the core file? Other than that, it seems doable.

    3. Re:Extended core dump? by ADRA · · Score: 2, Interesting

      You forget that the kernel has created a sandbox for this core to live in. If the sandbox wakes up with a different environment, byebye process.

      Simple example

      # ./bigwasteoftime &
      ./bigwasteoftime[1]
      # hibernate bigwasteoftime
      # exit

      The program is tied to the console which no longer exists, and if woken up, which process is it childed to? What if bigwasteoftime knew its parent before hibernation, and tried to modify it?

      As it stands, you cannot guarantee its stability.

      --
      Bye!
    4. Re:Extended core dump? by Anonymous Coward · · Score: 0

      You could simple call the abort() function, then assuming ulimit was set right you'd get a core dump without the problem of recovering from a seg fault.

      Rich.

    5. Re:Extended core dump? by ianezz · · Score: 4, Interesting
      GNU Emacs basically does this to reduce initialization times.

      When compiling Emacs from the sources, the initial executable file is only a (relatively) small virtual machine executing elisp bytecode.

      Then, it is started, and several basic elisp packages are loaded and initialized.

      Once initialized, it makes a dump of itself on a file on disk (IIRC actually dumping core by sending a fatal signal to itself).

      The dump is prepended with an appropriate loader which restore the Emacs process (in its initialized status) in memory, and the resulting file is used as the main Emacs binary (what you can usually find in /usr/bin).

      This works for Emacs because it knows when it is checkpointed, and special care is taken not to do anything that depends on parts of the running environment that can't be fully restored.

    6. Re:Extended core dump? by Anonymous Coward · · Score: 0

      This capability, in turn, is a Poor Man's version of the Lisp Machine's ability to "save world" or "save band." Basically, the paging file is saved out, and the machine can boot one of these paging files (actually that's the normal way to boot, you just boot the world/band the vendor gave you) and instantly restore everything to exactly the way it was when you "saved the world" (I am over simplifying a little -- there is some init code that gets run, when the band is either "warm" booted or "cold" booted).

      Think about the elegance of this. No stupid ".ini" or "rc" files. All applications' restart ability "just works." No code bloat due to ini/rc routines. Everything exactly the way you left it. No stupid, time-consuming, pre-loading of DLLs to hide how slow and bloated applications or desktops are (KDE, MS Windows).

      Those who do not understand the Lisp Machine are doomed to reinvent it. Poorly.

    7. Re:Extended core dump? by Dwonis · · Score: 2

      How about this? Trap SIGSTOP, but then stop anyway. (let the kernel handle the rest). When your process wakes up, re-initialize whatever you have to.

    8. Re:Extended core dump? by k98sven · · Score: 1

      Well if that ain't Emacs I don't know what is:
      Every concievable feature is already there..

  20. hhgttg by Score0,+Overrated · · Score: 3, Funny

    The job it was doing would have been done in a few days,

    In that case, Arthur Dent should know the answer.

    1. Re:hhgttg by libertynews · · Score: 2

      No, no, we already know the answer (42). Its the bloody question that is so elusive!

      Brian

      --
      Remember Lexington Green!
  21. eros-os by ischarlie · · Score: 2, Interesting

    back in the day there was a post:

    http://slashdot.org/article.pl?sid=99/10/28/015121 2&mode=thread

    about an operating system with "journaled" processes of a sort, that would automatically back up images of it's processes.

    1. Re:eros-os by Anonymous Coward · · Score: 1, Informative

      Eros has in its version 2 spec moved away from this idea implemented for all processes because of difficulties with retaining state for various knew intertfaces such as USB. But the core idea remains for some processes I beliver. Not sure what the current state of play is with eros as it seems to have a lower profile than it deserves. God forbid geeks more interested in coding than talking.

      check out for more details.

      hopefully Jonathan Shapiro will will add more details here if he sees this thread.

  22. hibernation by elixx · · Score: 0

    many laptops i've seen have built in hibernation stuff in the bios, which did something like create a partition of ~300MB and stored the current state of the system (ram contents, etc) to that partition, which it would reload to memory when it is started again.
    i'm sure there are more details to it. i've seen this done an a number of IBM thinkpads...
    would it be possible to create patches for current BIOS revisions that could hack in support for something like this?

    --
    No, Beowulf clusters can't imagine in Soviet Russia.
  23. you can by Lumpy · · Score: 5, Informative

    It's called software suspend for linux. look for it on freshmeat.net

    --
    Do not look at laser with remaining good eye.
    1. Re:you can by Lumpy · · Score: 5, Informative

      AHA! I knew I still had it
      http://falcon.sch.bme.hu/~seasons/linux/swsusp.htm l

      this is what you need.

      --
      Do not look at laser with remaining good eye.
    2. Re:you can by Anonymous Coward · · Score: 2, Insightful

      Talk about the ultimate in karma whoring. Instead of just having one post modded to +5, you get two by delaying the posting of your link. It's almost criminal.

    3. Re:you can by Anonymous Coward · · Score: 0

      KARMA WHORE ! HE GOT TWO FIVES OUT OF THAT SHIT. JUST FOR A DAMNED FRESHMEAT SEARCH.

      Beating the lameness filter now....

      asdfasdf asdfasdfasdf asdfasdfasdfasdf asdfasdfasdf asdfasdfasdfasdf asdfasdfasdfasdf asdfasdfasdfasdf asdfasdfasdfasdf asdfasdfasdfasdf asdfasdfasdfasdf asdfasdfasdfasdf

    4. Re:you can by Anonymous Coward · · Score: 0

      This page describes the system suspend-resume feature on Solaris and also has a useful list of things that won't work and to be aware of while a system is suspended. This could be useful to anyone implementing or using a similar feature.

    5. Re:you can by Anonymous Coward · · Score: 0

      and I could have gooten first post easily too...

      you wannabees just suck at it.

      Muahahahahahahahahahahaha!!!!

    6. Re:you can by i_am_nitrogen · · Score: 3, Informative

      There's just one tiny little problem with that. It only supports ext2. Try it with a journalling filesystem, and ... bye bye Linux partition!
      At least, last time I checked that's how it was. There may have been improvements made. It would require somewhat major changes to the VM and each filesystem in the current Linux implementation to get it working with journalled systems, or if Linux finally gets a journal-capable VM (similar to IRIX's, perhaps), it would just require some VM changes if it's done right.

      (Begin semi-OT stuff)
      Oh, and please, please everyone ask Linus not to rip out memory zones just because it's a BSD-like idea.

      Kernel 2.6 will probably be able to support hibernation without funkiness in the filesystems themselves, just a good VM setup. The new framebuffer system (Ruby) will rock, too (think 'echo "640x480-16@60" > /dev/gfx/fb/0/mode'), especially because DRI is going to be separated from X so console applications can take advantage of OpenGL as well.

    7. Re:you can by scrytch · · Score: 2

      Insightful my ass.

      Yunno, some people hit the 50 cap long ago. Some never cared. I thought this whinging over so-called "karma whoring" had died long ago (I was thinking of changing my sig), but I guess there are some people still left who are socially stunted enough that they cannot conceive of others partaking in conversation for the fun or edification instead of pleas for attention. I thought I was kind of messed up, but I can't say that I feel particularly validated or not based on some score I have on slashdot.

      --
      I've finally had it: until slashdot gets article moderation, I am not coming back.
  24. Saving Calculations by Anonymous Coward · · Score: 0

    Check out the web using the keyword "checkpointing". There's some publicaly available checkpointing support from MIT and probably some of the scientific oriented Linux sites like Beowulf probably have these libraries available.

    The idea is that a program doing a long calculation periodically dumps state and can restart from the last saved dump if necessary

  25. STOP TALKING ABOUT LAPTOPS! by Anonymous Coward · · Score: 1, Flamebait

    The question is about process cryogenics, not about how well your stupid laptops hibernate!

    1. Re:STOP TALKING ABOUT LAPTOPS! by Anonymous Coward · · Score: 0
      It's the same thing, freak. Hibernation is just more widely used on Laptops because they have such a short battery life, unlike PCs that just run off a power supply that usually isn't interrupted.

      Do the IBM Linux laptops come with this feature? I would assume so. I think it's built into the CMOS and not the actual OS for those laptops and saves it to a whole seperate partition.

    2. Re:STOP TALKING ABOUT LAPTOPS! by Anonymous Coward · · Score: 0

      WRONG.

      Process freezing requires support build into the kernel. Hybernation has very little do do with software at all.

    3. Re:STOP TALKING ABOUT LAPTOPS! by Anonymous Coward · · Score: 0
      Laptops aren't the only boxes that do suspend these days. Most "green" PCs have a suspend of some sort. The problem is, usually on server this is not configured.

      Did the person who started this thread even *TRY* doing an "apm -s" on the server in question?

      Sean

  26. process migration is the term you want by Danny+Rathjens · · Score: 2, Interesting

    There has been a lot of work done on "process migration". That is moving processes from machine to machine.
    Obviously those techniques would apply to what you are asking about.
    google has lots of links about it

  27. it's encrypted in your brain waves! by spacefem · · Score: 5, Funny

    I once had an enourmous computer working out a very important question but it was destroyed by Volgons five minutes before it was finished. I feel your pain.

    1. Re:it's encrypted in your brain waves! by medcalf · · Score: 2
      I once had an enourmous computer working out a very important question but it was destroyed by Volgons[sic] five minutes before it was finished. I feel your pain.

      That must have annoyed the Vogons, who were coming to do the same thing. Not to mention the mice!

      --
      -- Two men say they're Jesus. One of them must be wrong. - Dire Straits
    2. Re:it's encrypted in your brain waves! by Restil · · Score: 1, Redundant

      Thats VOGONS and it was 10 minutes, not 5.

      -Restil

      --
      Play with my webcams and lights here
    3. Re:it's encrypted in your brain waves! by Don+Negro · · Score: 2

      Also, it was encoded, not encrypted.

      But what's a few letters among severe geeks.

      --

      Don Negro
      Perl 6 will give you the big knob. -- Larry Wall

    4. Re:it's encrypted in your brain waves! by frunch · · Score: 1

      Yeah, the original poster should check into non-lossy compression algorithms for their brain.

  28. NFS? by the_radix · · Score: 1

    Trying to do this over networked file systems would be a pain. Imagine trying to copy a remote file that might be changing. Also, if the process was revived in a different environment, you'd have problems in general, be it a new processor, a new hard drive, anything.



    That said, I think this is a Good Idea(TM). But, it would have to be implemented on a a per-process basis, not just a general system daemon. Imagine if power failed, and every single process was suddenly "remembered". You have to have enough hard drive space, memory.... And if you ran out, it would be hard for the OS to figure out which ones to save and which ones not to.

    --
    This .sig is either false or a paradox.
    1. Re:NFS? by Anonymous Coward · · Score: 0
      How about you tell the shell you want a particular process and it's children to be "persistant" processes, sort of like putting a program into the "way-background". Make a new shell operator, use "&&" instead of "&". example:
      $program_i_want_to_stay_in_memory_even_after_a_reb oot &&
  29. I took a quick look... by eXtro · · Score: 2, Funny
    through my engineering library and I found a similar situation. A massive computer system, completely one of a kind, was destroyed prior to providing the solution to the problem for which it was designed. Recalculating the solution from scratch would take far too long, but there was one possibility. One of its computational units was still intact and the answer was surmised to be embedded deep within its memory.


    I think the same solution would apply here: Find Arthur Dent.

  30. BIOS feature by tongue · · Score: 1

    This is implemented in BIOS on my laptop, an HP pavilion. If memory serves its running Phoenix bios of some sort. It requires about a half-gig partition on the hard drive dedicated to the hibernation process, I think it has to be the first one on the drive. But basically it just copies the contents of the memory to that partition then loads it back up. works like a champ, and its OS independent. Actually, I've found it works even better under linux than 2k.

    Of course, as usual, I could be full of shite...

  31. Sun SparcStation 5 by Anonymous Coward · · Score: 0

    My Sun SparcStation 5 has this feature, last time I checked it had an uptime of over a year dispite the fact the we moved offices ten months ago. I just suspended it before moving it.

  32. Connectix VirtualPC will do this by avitzur · · Score: 1

    This is a feature of Connectix Virtual PC which can also host Linux. Of course, it has the advantage that it is simulating all the hardware in software.

    1. Re:Connectix VirtualPC will do this by Anonymous Coward · · Score: 0

      That's an advantage? He was running a process that was taking days--the lack of performace in emulation is annoying when you're interacting with it but when you're running a process that takes hundreds of hours that lag adds up.

  33. No need, my good man by JohnTheFisherman · · Score: 2, Offtopic

    The answer is 42. :D

    1. Re:No need, my good man by mscout1 · · Score: 0

      But What is the Question?

      --
      ------- I saw a VW Beatle the other day. The vanity Plates said "FEATURE"
    2. Re:No need, my good man by mrpotato · · Score: 1
      by JohnTheFisherman
      The answer is 42. :D

      That would be the age of the captain?

      --

      cheers
    3. Re:No need, my good man by Anonymous Coward · · Score: 0

      ho, ho, ho, never heard that before, and it go moded to 4 (talk about a bunch of newbies)

    4. Re:No need, my good man by spitzcor · · Score: 1

      I think that there is another important reason to checkpoint an application. DEBUG

      If I'm running a buggy app and I want to figure it out - it could take an awe full amount of work to reproduce the circumstances. But, not if you checkpoint!

      Just ckpt your app and you can start debugging from that point over and over again. You can also send your ckpt to a support person so they could help.

      -spitzcor

    5. Re:No need, my good man by Anonymous Coward · · Score: 0

      read the book, mate...

  34. Condor by roukounas · · Score: 1

    Condor http://www.cs.wisc.edu/condor uses a checkpointing mechanism for migrating processes between hosts (works on Unices & Win). Not exactly what you need, but maybe a starting point.

    1. Re:Condor by Anonymous Coward · · Score: 0

      Ah, but can you checkpoint a process running under winblows, reboot the machine under Linux, then re-start the checkpointed process under wine?

    2. Re:Condor by jasonzzz · · Score: 1


      The Condor scheduler is alive and well. There are many more such schedulers that are hooking up disparate computing resources into a unified and ubiquitous computing platform. Others like The Grid and Globus comes to mind. Check out:

      http://www.ahpcc.unm.edu/Systems/Documentation/C on dor/

      http://www.griphyn.org/news/press/CMS_Success_St or y_June_2001-jt-web.htm

      The feature of suspending and then restarting or migrating a process from host to host is a sort of "Checkpoint and restart". Many high performance computing problems, because of their nature to run over lengthy periods of time (sometimes a single problem can crunch data continously for many many months at a time), cannot afford to be caught by failure several months into a set of calculations and then having to restart again and therefore use these particular features to repeatedly "save" and "checkpoint" both the process and the data during specific intervals.

      All this however, depends on how readily the kernel and parts of the OS are build to let the user take advantage of the actual machine states and save it off.

  35. Java by jhines0042 · · Score: 1

    You know what, i bet that something like this could be done really easily in Java. Suspend the VM, store its state, then when the system boots back up you restart the VM with an argument to restore the state. Would also be great for debugging purposes or sending in bug reports (Here's a copy of my VM state when it hung).

    ....

    I'm starting to drool over the possabilities here.

    --
    42 - So long and thanks for all the fish.
    1. Re:Java by Anonymous Coward · · Score: 0

      The Squeak Smalltalk system (www.squeak.org) does this. I just suspended a long-running computation by hitting cmd-., then saved the image and quit.

      When I start up the image again, the state of the machine is exactly as when I had stopped it, and I can continue that thread of execution by clicking "Proceed"

    2. Re:Java by Anonymous Coward · · Score: 0

      You can serialize objects to file

  36. Resurrecting core files by robbo · · Score: 2

    I've always wondered how hard it would be to resurrect a core file. One would think that there's enough info in a complete core to reopen all the open fd's, and possibly even reinitiate network connects. Everything else is there-- program counter, stack, heap, etc. As such, one could 'kill -ABRT' the process and revive it again later. Has anyone seen this done?

    --
    So long, and thanks for all the Phish
    1. Re:Resurrecting core files by chibitoku · · Score: 1

      Emacs used to do that to speed loading. Sendmail too I think...

    2. Re:Resurrecting core files by RFC959 · · Score: 1

      Some Perl FAQs have mentioned using 'undump' to generate an executable of sorts from a Perl script - you write your script and make it dump core just as it begins, then undump it back into an executable - but this has always been deprecated, I think.

    3. Re:Resurrecting core files by reverius · · Score: 1

      It's very depreciated now. There's a much better way to make executables out of Perl...

      it's called "perlcc". Run that command, your perl distribution probably comes with it. It simply compiles perl into an executable by first translating it to C (with the help of libperl) and the compiling.

      I don't know how well that approach works for more complicated Perl apps w/ modules, though...

  37. Suspend by selectspec · · Score: 4, Informative

    You can't just serialize and page out one process. Under every process are a slew of kernel objects and kernel crud including the virtual to physical mappings of your address space. It would be quite a challenge to isolate all of this and somehow persist it.

    To make suspend work, you'd have to dump your entire memory image to disk. Then you swap in the entire image, kernel and user pages alike.

    --

    Someone you trust is one of us.

    1. Re:Suspend by arkanes · · Score: 2

      Which is exactly how windows does it. This even seems to work with memory-intesive games that manage thier own swap, like Diablo 2

    2. Re:Suspend by Anonymous Coward · · Score: 0

      While it is true that there are a lot of things like virtual to physical mappings, it is not interesting to save these.
      You need to save your virtual address space: a list of memory blocks and the virtual addresses where they appear. No need to save the mappings to physical addresses, these will be re-created once you load it again.
      (of course you would need to save mmap()s to physical addresses and files)

  38. Should be easy by Anonymous Coward · · Score: 1, Informative

    (I'm a Solaris user but I assume linux is the same). You can type Ctrl-\ to get a coredump of a runnining process and you can load a coredump with dbx. It seems like that's 90% or the infrastructure. You'd want it to run outside dbx and do it automatically. My guess is you'd have to just remap some addresses, recreate file pointers (assuming said files haven't been modified), reinstate the stack, and go.

    This should be even easier to do in a JVM, even not relying on their serialization stuff.

  39. This CAN be trivially done on any un*x i know... by ugen · · Score: 2, Redundant

    1) Produce the core dump of a process
    2) Use the core and process image to restart it
    (for example in the debugger such as gdb, if you
    don't want to write specialized software).

    To the best of my knowledge perl "compiler" uses
    precisely this technique to produce perl "executables" - dumps them out as a core right
    after compilation and reuses it later on.

    You can do this to a kernel as well, if you
    REALLY want to.

    However, since indeed many things may be dependant
    on state of kernel, files, network connections, devices etc. etc. doing this is not adviseable.

    Good coding practice for long-running processes is
    to actually spend some time on writing the state
    saving functionality to support process restart.

    Anyway, (call it a flame if ya will) but the fact
    that /. posts this as a relevant question is very
    disquieting - level of technical knowledge here
    gets reduced day after day.

  40. Solaris Suspend & Resume by morcheeba · · Score: 3, Informative

    I've used the Suspend/Resume feature on a sun box. IIRC, it mostly worked, but with a minor hitch that made me worry enough to never do it again. This suspend/resume is just like the laptop version -- save a copy of all memory to disk -- not the cryogenic per-process version you're talking about.

    The per-process sounds neat, but usable only if you've got a simple critical task you're running. For a more complicated application, multiple processes may be working together, and you'd have to suspend all of them at the same time.
    One big question I would have would be file handles... if you restore a process that thinks it owns file handle #5 and some other process is already using it, it would be awkward to get either process to use a different handle.

  41. Virtual PC on macs... by Slef · · Score: 1

    There is a feature of VirtualPC on Macs that does this. If you try to exit the emulator before shutting down the emulated machine, VirtualPC asks if you want to save the memory. If you say yes, the whole memory of the emulated PC is saved in a file, and you can continue using the PC later, exactly where you left it last time.
    The other nice thing about this is that restauring the memory is much faster than rebooting.
    You can also save several sessions and start again with the one you want.

    Of course it would be nice to do all that for just one process, or maybe even for all of them on a UNIX machine...

    --
    -- Slef
  42. Future of Process Management by gehrehmee · · Score: 3, Interesting

    First, let me say that what the poster is suggesting sounds a little more sophisticated then a simple re-implementation of XP's hibernate function, although functionality like that under UNIX would certainly be invaluable. It sounds like the poster wants control over individual processes, something that I consider far more interesting.
    What's said here is certainly very reasonable. But the extensions of whats being suggested are even more fantastic. Once a process is completely removed from memory, with file handles and storage and status all kept away safely, is there any reason that the process is really tied to that computer? Why wouldn't it be possible to take that 'frozen' process, transfer it to another machine with access to the same filesystem on some level (some translation of file handles would likely be neccesary), and thaw it there, allowing someone to move a running process to another machine? Need to replace your web server's only CPU, but don't want downtime? Move the process to a backup machine, replace the original's hardware, and move the process back.
    I even thought I had heard that someone was working on just such a project, or at least thinking about the details of implementing it. (I'm just getting started in learning UNIX internals myself). Anybody have more references to information on this sort of thing?

    --
    "You know, Hobbes, some days even my lucky rocketship underpants don't help" -- Calvin
    1. Re:Future of Process Management by Rudy-Omega · · Score: 0

      The process is called checkpoint/restart. It does exist in the UNIX world, mainly in the HPC markets.

  43. SNES9X, ZSNES, etc do this by Raster+Burn · · Score: 1

    Many of the old console emulators do this exact thing. It's really nice to save the state of the game, and load it back up exactly where you left off. It makes it really easy to cheat too!

    I was wondering the exact same things about other applications, or the whole OS itself! Woudn't it be much faster to have an on/off button that would save and load the state of the computer? Then you wouldn't have to boot it up and down, you can just pick up where you left off. Maybe this isn't feasible, I don't know.

    1. Re:SNES9X, ZSNES, etc do this by zhar · · Score: 1

      While Zsnes and Snes9X do do this, they are capturing an entire system state. A Super Nintendo's hardware will never change, and the hardware itself is only 16-bit. When a person is disscussing a 32-bit system such as a x86 based pc, the problems increase exponentially. First, the Super Nintendo only has 128KB of conventional RAM, and 16KB of video RAM. This makes the saved states very small. If a PC with 2GB of RAM had to Save a system state to disk, the amount of time it would take to write the state would take an enormous amout of time. If you only had a short amount of time before a UPS dies, then you would not want to be wasting it trying to save the entire system to disk.

      --


      DRINK DUFF (responsibly) DRINK DUFF (responsibly) DRINK DUFF
  44. Hibernate?? by Slitwrist · · Score: 0
    Im not sure that hibernate is what is wanted.
    All that does is make all your apps still run. and the apps may not be actually DOING anything.


    Freezing every calculation, right down to waht was in RAM, at that very moment....i dont know. might be a little difficult. or not? we have been able to freeze and thaw light for godsakes..

    --
    Carpe Noctem -=- Seize The Night
  45. Can be done? yes. Easily? Ummmm. by CodeShark · · Score: 1
    I am not sure if this can be done at an OS level or if this is something that is much more easily done as much as at an application level. We wrote some telco (telecommunications) software that essentially went through progressing failure scenarios and restarts and essentially came up with "software phases" that could be restarted following nearly every type of failure. Similarly, Oracle, etc. have recovery modes where a person installs the last backup and then re-applies all of the delta log files, resulting in an up to the point of failure restore (though it can take a few hours -- but my experience on this is with a 100 Gig plus database).

    So it seems to me that if this were going to be done at an OS level, the OS would need some kind of integration with a data base and apps needing to "freeze: would need a standard method of saving the last completed intermediate phase and deltas into the OS database for later re-activation.

    I don't personally know of any software/OS combination that does this well, but am admittedly not an OS know-it-all, and look forward to responses from the rest of the /. community.

    --
    ...Open Source isn't the only answer -- but it's almost always a better value than the alternatives...
  46. different approach: Savepoints by esonik · · Score: 2, Interesting

    A different solution, which is very common for long running processes, is to use savepoints, i.e. save the state of the process regularly to a file at suitable points of the algorithm. Once your process dies or you killed it, you can restart from that savepoint. If your state information is very large, you can stretch the save interval to reasonable long times, e.g. several hours. Typically you don't mind to lose some hours of calculations due to an occasional power outage.

    Of course this solution is not as general as the "process cryogenics" you describe, but it's also easier to implement because you have more information about the problem.

    1. Re:different approach: Savepoints by jgerman · · Score: 2

      Yes, this is similar to what I've done in applications, especially easy in an OO environment. Coded correctly you can view your process as a virtual machine, one that has a fixed instruction set. Serializing all of the data and dumping it to file will allow you to pick up where you left off. Of course this is per application, but it's is relatively simple to build into your app when you write it.

      --
      I'm the big fish in the big pond bitch.
    2. Re:different approach: Savepoints by smillie · · Score: 1

      I did something like this some time ago that needed to keep its state between crashes and reboots. I would periodicly write to alternet state files with a calculated checksum. On program restart I checked the newest file for a valid checksum and used the older if the checksum failed. This way a power off in the middle of writing one of the files didn't loose my state completely.

      --

      Dyslexics Untie!

    3. Re:different approach: Savepoints by anonymous_wombat · · Score: 1

      Any program that takes days/weeks/centuries to finish should be writing out its partial state on a periodic basis so that it can resume from the partial state later. I realize that this is a different problem than having a binary that works but for which you do not have the source code.

  47. No reason not to by NaturePhotog · · Score: 2

    There's no reason why you can't do it either in an app by saving state or in the OS by saving memory to disk as on a laptop.

    GEOS had the concept of state-saving in the OS circa 1990, so it's nothing new. The UI saves its state, what apps are running, what windows are open, etc. and restores it exactly as you left it when you restart. If an app has extra data to save, such as where it was in a lengthy computation, it can save it, too.

    A slightly different approach than brute-force writing out all of used memory, but both work quite well with the speed of current hard drives.

    1. Re:No reason not to by rsmith · · Score: 1

      I would agree that it might be doable on a per-process basis (if you're willing to look over the already raised objections like broken network connections and removed files etc.)

      But doing it to the complete OS would also mean saving and restoring the state of all the hardware devices (like .e.g the OpenGL state machine in your graphics card, or your modem).

      I'd imagine that not every piece of hardware has the ability to dump and restore it's state on command. Worst case, you might have to replay every command to a device since a known state (say after boot-up).

      I'd say that this is non-trivial for a PC that is not designed for it. PDA's can do it, obviously, but I would guess they're specially engineered for it.

      Roland

      --
      Never ascribe to malice that which is adequately explained by incompetence.
  48. BeOS by realnitwit · · Score: 1

    This is semi-feature in BeOS, for instance the first time you boot, (or if you modify drivers, etc,) it takes a while because it has to load all of those drivers, do initialization, etc... However it also appears to save a snapshot of everything required to boot again, because the next time you boot, it only would take 10 to 15 seconds.

    I may be wrong about exactly how it does it, (i.e. snapshot,) but it works as if it had done that. For application stuff, this is MUCH harder, unless you save out the memory of each application to disk, and hope that any hardware they need doesn't change during the next boot. There are lots of little niglys however, and this problem isn't light development.

  49. Checkpoint/restart by td · · Score: 3, Interesting

    This facility is called checkpoint/restart. It was a feature of OS/360 and other operating systems in the 1960s. In some very early versions of Unix, core files were restartable. Usually it's pretty easy for programs to save enough state to be restartable on a case by case basis, except when it's just about impossible (like when networks reconfigure) so it's not a popular system feature these days (hard to implement in a general way, doesn't do a very good job in the cases that can be handled easily.)

    A friend of mine (Hugh Redelmeier) ran a very long (~400 day) computation on a PDP-11 in the mid-1970s. The program ran stand-alone, and part of the test plan involved flipping the power switch on and off a few times -- very amusing to watch the program keep on running right through power failures. (Main memory on the machine in question was magnetic cores, which are non-volatile.)

    --
    -Tom Duff
    1. Re:Checkpoint/restart by shaper · · Score: 2

      I was peripherally involved in some early efforts to include checkpoint/restart in POSIX with respect to standardizing fault tolerance and high availability features. I was a US DoD employee at the time. The military's interest was to be able (in a semi-portable standard way) to reset to a known good previous state in the case of some arbitrary failure mode in safety critical systems, i.e. flight controls, stores (weapons) management, etc. AFAIK, the POSIX standards efforts never went very far due to many different, sometimes conflicting needs. The more business-oriented high availability people had needs for very similar OS functionality that was markedly different in character from the military's viewpoint. My involvement ended in the early to mid 90's, so my understanding of the situation may be more than a little stale.

    2. Re:Checkpoint/restart by Malc · · Score: 1

      "A friend of mine (Hugh Redelmeier) ran a very long (~400 day) computation on a PDP-11 in the mid-1970s"

      I wonder how long that would take on one of today's computers?

    3. Re:Checkpoint/restart by td · · Score: 2

      I made a version of the same program and reran it a few years ago on an SGI Octane. It took about 8 days.

      --
      -Tom Duff
  50. VMWare by Creedo · · Score: 2, Informative

    Vmware does this for the VM's it hosts. Works great.

    Creed

    --
    All that is necessary for the triumph of good is that evil men do nothing.
  51. Build in persistence yourself. by blair1q · · Score: 5, Insightful

    Any program that you intend to run for more than a day or two you should checkpoint its intermediate results to disk, even if this adds 100% to the run time.

    --Blair

    P.S. Alternatively, you could write a program to have the rebooted computer pull scrabble tiles from a bag structure and print them to the screen. You might at least get some clue as to whether it was asking the right question.

    1. Re:Build in persistence yourself. by kaisyain · · Score: 1

      And any program that intends to use more RAM than is physically present should implement a virtual memory mechanism? Some features get implemented in the kernel because we don't want everyone to have to reinvent the wheel. Kernel support for checkpointing processes is one such thing.

    2. Re:Build in persistence yourself. by dillon_rinker · · Score: 3, Insightful

      Re-read the comment you replied to; it suggests something subtly different from what you suggest. Checkpointing intermediate results is not the same thing as checkpointing processes. To take a much oversimplifed example, I write a program to multiply a two-digit number by a one digit number. My program does the following:

      1. Multiply ones digits
      2. Multiply tens digit by ones digit
      3. Multiply previous result by ten
      4. Add results from steps 1 & 3
      5. Display previous result.

      If my program crashes at any point before step 5, I have to start all over. So, I save my intermediate results at step 1, step 2, step 3, and save my final result at step 4. This is checkpointing my intermediate steps.

      Your suggestion, on the other hand, is to periodically save the entire system state. This is checkpointing the processes.

      I see a need for both types of checkpointing - applications periodically checkpointing data (like the autosave feature in the market-leading word processor) and system-state saves (like the sleep feature of some laptops). Reliability and recoverability should be engineered in at all layers.

    3. Re:Build in persistence yourself. by cweber · · Score: 1

      Any program that you intend to run for more than a day or two you should checkpoint its intermediate results to disk, even if this adds 100% to the run time.

      Amen! And it probably won't add 100% to the runtime, more like a few %. Plus, even apps that run for a few hours at a time could benefit.

      In the long run you'll save yourself so much lost compute time that you'll be glad you did it.

    4. Re:Build in persistence yourself. by Erasmus+Darwin · · Score: 2
      "Any program that you intend to run for more than a day or two you should checkpoint its intermediate results to disk, even if this adds 100% to the run time."

      That seems rather wasteful. The whole point of checkpointing is to avoid having to waste time recalculating things. Since you're trading off between two potential wastes of time, it's a more complicated issue than you make it out to be.

      For example, imagine a scenario where you have many, many jobs to run. Each job takes a week to run. Your goal is to run the most jobs in a given time period. Checkpointing doubles the run-time, kicking it up to two weeks. Finally, let's say there's a 1% chance per day that the system will go down for the day.

      That means there's a 93.2% chance that we'll make it through a non-checkpointed job without failing. Even if we do fail, there's a 93.2% chance that we'll make it through the rerun. If we make it through either time, our worst case scenario is to tie with the checkpointed job.

      Still, occasionally, a non-checkpointed job will hit multiple failures and take longer than a checkpointed one. But under the constraints I provided, it should be clear that checkpointing's going to lose in the long run.

      All that being said, there are certainly scenarios where checkpointing is the better choice, such as when it's more important to get the jobs done within a certain deadline or when the failure rate is higher. But it's absurd to declare checkpointing to always be the optimal solution.

    5. Re:Build in persistence yourself. by blair1q · · Score: 2

      If you know the MTBF on your computing system (including every necessary system all the way back to the watershed that's driving the hydroelectric plant) then yes, you can do a cost-benefit analysis.

      But if you're a yahoo at J. Random University who's just writing in his thesis, you're going to type :w ever few words, no?

      If the system it runs on is out of your control, and you have no idea of the probability of a crash in the next few weeks, and you only have one or two shots to get it done, you need to maximize the robustness.

      But yea, you're right, don't get paralytic about it. Just organize your data and state info into a data structure that can be serialized to a file and read back in later.

      --Blair

    6. Re:Build in persistence yourself. by jungd · · Score: 1

      This is exactly what persistient OS'es try to avoid. If the whole system is persistient, then not of this suspend/resume voodo is needed because the state of the OS persists for ever anyway.

      check out this for some examples of what I mean.

      --
      /..sig file not found - permission denied.
    7. Re:Build in persistence yourself. by Mryll · · Score: 1

      Points taken...

      What you perceive as "failures" might be considered as unplanned interrupts in some applications, that can occur quite frequently.

      For example, in petroleum reservoir simulation, it is common to play out a scenario multiple times with user-driven data changes such as adding wells or changing other boundary parameters controllable by an engineer, seeking to get the best quality production from a field over its lifetime. The timestep at which a calculation might be interrupted to implement changes is unknown. Computation of a timestep is expensive and slow. Checkpointing clearly makes sense.

      I guess I'd prefer the words "interrupted run" to failure... :)

    8. Re:Build in persistence yourself. by uid8472 · · Score: 1

      I see a need for both types of checkpointing - applications periodically checkpointing data (like the autosave feature in the market-leading word processor) [...]

      Oh, you mean Emacs?

    9. Re:Build in persistence yourself. by gotan · · Score: 2

      As long as the 'interrupted run' is not due to some moron switching off their workstation when an application of theirs hangs (in an university environment say), what you describe is under control of the person running the program. Also, in the case you describe, altering parameters during runtime seems quite common. Yet i wonder, why it isn't known in advance, when (after which iteration) someone might want to change the parameters (so you could make the program stop after that iteration and dump the data then).

      Also a better idea might be to have the program look for new/changed parameters itself (by means of a special input file, and maybe sending a signal), or make it stop itself and write a dump by some mechanism controlled from outside (a signal or a 'stop'-file it looks for).

      Then i wonder how writing a dump would produce a 100% overhead (that would mean, half the time the program is writing a dump, it should alternate between dump-files then, so there's at least one valid dump at any time), and be worth it. Usually, on large supercomputes handling numerically intense programs such as you describe, there are means of fast I/O too, which provide the necessary bandwith to dump that data fast, or at least nonblocking.

      --
      "By the way if anyone here is in advertising or marketing... kill yourself." -- Bill Hicks
  52. Related stuff by shankark · · Score: 1

    There is some ongoing work on hibernation and process checkpointing. ACPI4Linux
    is an attempt at implementing the ACPI specification for Linux. This is different from APM though, and the product is quite preliminary. There's also an interesting site on process checkpointing, migration and resumption. Basically, its implemented as a kernel module that upon invocation, freezes the scheduler, dumps all process-related information into a separate hibernation partition and shuts off.

    HTH,
    Shankar

  53. Java Serialized Object by Anonymous Coward · · Score: 0

    If only you were using Java; you could have included a trigger to stop the process, serialize it, send it to another server, and continue the process until completion.

  54. Virtual PC... by Da_Big_G · · Score: 1

    from Connectix (I think) does this... of course, it's not a "whole system" solution (I run Win2k as host, then Virtual PC with RedHat 7.0) and it saves the state of the linux machine to disk whenever I shut it down. This works pretty well for me, but might not be so great for huge number crunching, as the Virtual PC is always a lot slower than the host OS. Still, it might be worth looking into for some people.

  55. More robust software by Analog+Squirrel · · Score: 1

    The most straight-forward solution to the data loss problem is to design the software to maintain its own restart data. I've spent about a year working on an atmosperic simulation that typically takes several days to run. We wrote the sim program to dump its current state every hour or so, that way in the event of catastrophy(power outage, OS crash, whatever), the most we'd lose is an hour's worth of computation. Of course, this requires that you have enough access to the innards of the program to do this...

    --
    I'd rather be flying
  56. Software suspend by smartin · · Score: 1, Redundant

    There is or was a project to suspend the whole os to disk. Details are here: http://falcon.sch.bme.hu/~seasons/linux/swsusp.htm l

    --
    The difference between Canada and the USA is that in Canada healthcare is a right and gun ownership is a privilege.
  57. User Control by Skweetis · · Score: 2, Interesting
    It would be neat if this could be controlled by the user. Ideally, this would be done by a process signal. To actually cause a process to hibernate, a user would do a kill -HIB $PID or something like that. Then the kernel would save the process information to a file (somewhere under /var maybe?) until it is restored.

    This next one would complicate things a bit: the user should also be able to wake up the process the same way, i.e. kill -WAK $PID. This means that an index of hibernated processes also needs to be kept synchronized between the kernel process tables and a file on disk, to be preserved between reboots.

    Maybe I'll write another kernel patch...

    1. Re:User Control by Anonymous Coward · · Score: 0

      So what you are asking for is:

      Ctrl + Z

  58. Been there, done that by jstott · · Score: 2, Informative
    Look at the makefile for emacs--the emacs executable is essentially a memory dump of a partially initialized emacs process. Perl's dump and undump work the same way.

    For long-running processes, rather than shut down the process when the UPS kicks in, I've always found it easier to have the program snapshot its data tables periodically (say every half-hour) and build a "resume from disk" feature into the program. This lets you restart the program from its last check-point even in the event of uncontrolled program termination (e.g. kill -9 and the like).

    -JS

    --
    Vanity of vanities, all is vanity...
    1. Re:Been there, done that by Anonymous Coward · · Score: 0

      it would be worthwhile to implement some sort of memory state saving feature into a JVM... Such a feature would only need to be implemented once for the JVM, and would apply to all apps on that JVM. Then maybe our java hotspot apps would only take seconds to start instead of minutes!

  59. The hardware will be a big issue.... by King_TJ · · Score: 2

    The main reason this "suspend" feature works relatively well for a laptop is because the hardware is a "given". The laptop has to have a certain video card and motherboard chipset, specific type of hard drive, floppy, CD-ROM and sound device. (In fact, when laptops fail to come back up properly from a suspend, it's almost always the one "add-on" card people have in laptops, the PCMCIA network adapter, that causes the problem.)

    3Com PCMCIA cards are about the only ones I've used that allow the laptop to power them down and back up again, and resume network activity without a complete machine reboot.

  60. You could try Connectix's Virtual PC by guttentag · · Score: 1, Redundant

    You can run Linux in an isolated environment on your computer and when you want to freeze a process, VPC can save the state of the environment. When you thaw it hours or years later, the environment doesn't know any time has passed. Since VPC can run multiple instances on the same machine, you can put the critical process in its own environment.

  61. Hibernation comments are missing the point by ry4an · · Score: 5, Insightful

    The comments to the effect of "it's called hibernation, and has done it for years" are missing the point. That hibernation is a BIOS supported dump to disk. It's a feature on most laptops and works with just about any OS -- it's worked on my Linux laptop for years.

    I think the feature to be discussed is Operating System (not BIOS) level support of the hibernation of a single process. It'd be nice if I could do a:

    kill -HIBERNATE `cat /var/longoperation.pid`

    and have that program get frozen to disk. Then if I could resurrect just that process later it'd be a handy feature for the long running program that you want to postpone until after you've done whatever you needed to do in single user mode.

    1. Re:Hibernation comments are missing the point by Hrunting · · Score: 5, Insightful

      And if you have something like that, you open yourself up to a wealth of potential problems in the program. Take this simple perl script.

      #!perl

      use strict;

      my $pid = $$;
      print $pid


      If you stop it between those two $pid commands, there's no guarantee that you're going to get the same pid value back. Programs would have to be specifically programmed to handle this sort of thing (there are other examples, this is just the most basic; network programs particularly would have problems).

    2. Re:Hibernation comments are missing the point by Anonymous Coward · · Score: 1, Insightful

      No one said process level hibernation would be easy, just that it would be nice. You've pointed out that the OS would at least have to provide some sort of pid reclaimation system for it to be tennable.

    3. Re:Hibernation comments are missing the point by eries · · Score: 2

      And what an incredible debugging tool. I know that my process is producing buggy output after running for four hours. Solution: run for four hours, hibernate, copy and re-run the last five minutes as many times as you want.

    4. Re:Hibernation comments are missing the point by The+Smith · · Score: 2, Informative
      You mean like: run for four hours, force a core dump by pressing Ctrl-\, and then re-run the last five minutes as many times as you want?

      You don't need hibernation for that.

    5. Re:Hibernation comments are missing the point by Anonymous Coward · · Score: 0

      Who says the PID has to change? That's just another part that would have to be saved.

      Maybe the program could be seen as a process even though it really isn't a process.

    6. Re:Hibernation comments are missing the point by gorilla · · Score: 3, Insightful

      There are lots of other issues. If a program has a socket, or a device open, what should happen? Should the OS reopen the socket? What if the remote end is requiring status. No point reopening a FTP session if the application thinks it's already sent the userid/password but the server doesn't. What if it's a device, eg a modem, and it is locked?

    7. Re:Hibernation comments are missing the point by spitzcor · · Score: 1

      Getting the old pid is possible. The kernel can assign the newly started job its old pid unless it is in use. If it is in use then wait or die or do some magic.

      -spitzcor

    8. Re:Hibernation comments are missing the point by Kefabi · · Score: 1

      Windows XP supports this feature. My friend has a cracked version of XP, doesn't use a laptop, and he can hibernate to disk, pull out his power-supply if he wants to, and still come back to the same state later.

      This is not standby or sleep mode. Those are separate modes also supported by XP.

      I'm not a big fan of XP (or Microsoft for that matter), I use Debian GNU/Linux for home use, but to say Microsoft only imploys idiots is a flat out lie.

    9. Re:Hibernation comments are missing the point by Petrus · · Score: 1

      If the socket/dev open status is stored in RAM (and it most certainly is), than it is most certainly stored.

      If you simply take your whole 25Mbyte of RAM, add the intrenal registers dump, dump it to dedicated partition, you do not have to even sync the disks or look at the swap. The BIOS simply reloads the dupmped RAM, resloads all CPU registers and there it gets the PC program counter where it ended last time.

      The problem are the variables that are not in the RAM, e.g. CMOS contents, RTC time, parallel port initialization, CDROM RAM cache, sound card preset waveforms, your whole screen. These are dependent on the hardware and also have to be stored. Some of it should be restored (screen), some should not (System Time).

      If this is handled through BIOS on a closed system, all is fine because BIOS knows what to save/restore from where.
      If you are on a desktop system, you need to have for each driver a save and restore call to be able to handle the problem, because BIOS knows only about what is on the motherboard and cannot care about every card you might have plugged in.

      Shortly, there are problems but somewhere compltely eles than where you percieve them.

      Petrus Vectorius.

    10. Re:Hibernation comments are missing the point by stevef · · Score: 1

      You could reserve the PID for the process when it is restarted. This already happens with zombies. The process control block of the child is not freed until the parent calls wait or exits.

      Of course, freezing and unthawing across reboots would be an issue. The kernel doesn't save any state across reboots.

      -Steve

    11. Re:Hibernation comments are missing the point by Anonymous Coward · · Score: 0

      It sounds like XP has the software only half of it done, but it doesn't really get interesting until I suspend a single process. Dropping a long running process into hibernation, then heading into single-user mode to ugrade my kernel, and then restarting the process is more in line with what's being discussed here.

    12. Re:Hibernation comments are missing the point by redback · · Score: 2

      Ever used Windows 2000 or WindowsXP

      They have hiberbate. Completely hardware independant hibernate.

      It works on anything that has proper drivers

      You will find it under power in the control panel in w2k, and its on by default in wxp

    13. Re:Hibernation comments are missing the point by enkidu · · Score: 2

      Sorry, that is not correct. The state of most programs are not represented by the "memory/stack space" of the process + the register status alone. You have to remember that the kernel is also part of the space in which most processes run. Add in network sockets and device handles and inter-process semaphores and hibernation gets really complicated really quickly. The way around that is to restrict yourself to a small(er) set of system calls which is what Condor does I believe.

      In fact most "checkpoint anytime" systems allow you to delineate atomic sections of code where checkpointing/hibernation should not happen. The only way to allow true checkpoint/hibernation anywhere is to build is explicitly into the kernel.

      --

      There is no trap so deadly as the trap you set for yourself
      -Raymond Chandler, The Long Goodbye
    14. Re:Hibernation comments are missing the point by cpt+kangarooski · · Score: 1

      Being able to suspend individual processes is interesting, but being able to bind groups of processes together is where it can get even more interesting.

      I'd love this for the sorts of projects I work on.
      Group one might be Photoshop and a particular email so that I know what I was working on, and have it left just as it was. But I might want to swap over to group two, which would be a text editor and a web browser, so that I could make changes.

      Having the groups hiding and showing themselves could be great in some situations.

      --
      -- This and all my posts are in the public domain. I am a lawyer. I am not your lawyer, and this is not legal advice.
    15. Re:Hibernation comments are missing the point by Dwonis · · Score: 2

      Actually, you only need to send a SIGSTOP to the applications themselves, then get the kernel to swap out the process completely and save the result somewhere.

  62. That will not be easy by bartman · · Score: 2, Interesting

    There are big problems with such an approach, and mainly with device usage. Basically they are all the problems that you would have with process migration add a few because of temporal discontinuity.

    If you are using a scanner, or a mouse, or whatever, that device may not be there or may not be available when the process is brought back. Furthermore you may have a file descriptor opened on a local (or network shared) file which no longer exists or has changed drastically.

    There are further non-device-dependent problems with shared memory, opened-but-unlinked files, parent PID, IPC resources.

    Having said all of the above... I suppose that for the very rare case that your program is completely memory and CPU dependent you could retire and recover a task.

    my $0.02

    --
    -- bartman
  63. Apple Tried this with OS 9 by zaius · · Score: 3, Interesting
    Apple implemented this feature in early versions of OS 9, but took it out after they realized that some laptops would never "unfreeze" without the user hitting a reset switch buried deep inside the laptop.

    The idea was that when you put your computer to sleep, instead of keeping the SDRAM (or whatever the laptop had) powered to preserve the memory contents, it would write it all to a special sector on the hard drive that the firmware knew to read from when starting from sleep. This allowed sleep to be even more low-power than it already is, since a hard drive does not require power to retain data.

    1. Re:Apple Tried this with OS 9 by Hadlock · · Score: 1

      i think most modern portable macs (~'93 on) all use the SO DIMM standard. i know the 240 duo series did, and my powerbook g4 does. of course, the older powerbooks can;'t accomidate the 1 gig my PB can ;)

      --
      moox. for a new generation.
  64. EPCKPT by cmason · · Score: 5, Informative
    EPCKPT is a checkpoint/restart utility built into the Linux kernel. Checkpointing is the ability to save an image of the state of a process (or group of processes) at a certain point during its lifetime.

    --

    --
    "If you are an idealist it doesn't matter what you do or what goes on around you, because it isn't real anyway."-R.P.W.
  65. This would be useful for more than just blackouts by EccentricAnomaly · · Score: 2

    If you could sleep processes you could run some intensive job at a high priority when your not logged into your workstation and then sleep the processes when you log in. This way you could run some job that takes weeks or months but not bog down a workstation that you need for doing daily work on.

    Yeah, you could "nice" down the process so that it doesn't slow things down while your logged in... but then system processes at higher priorities might slow down your number crunching when you're not logged in... It'd be best to be able to run it at high priority at night only.... ya know, use those unused cycles.

    --
    There are 10 types of people in this world, those who can count in binary and those who can't.
  66. Application level solution. by Mark+Imbriaco · · Score: 2, Interesting

    One fairly simple alternative is to simply have the application save it's own state to a "checkpoint" file periodically. This approach has been used in other applications for a long time in the form of auto-save files (ie: emacs) and would be easily adapted to a long running program like the one you describe.

    Just because the OS doesn't support it automagically it doesn't mean that you can't solve it for yourself with a little bit of extra work and planning.

  67. Software suspend by Timbo · · Score: 2, Informative

    Linux software suspend may be of interest.

  68. Software Suspend by Anonymous Coward · · Score: 0

    Take a look at this http://falcon.sch.bme.hu/~seasons/linux/swsusp.htm l

  69. Great for high end biz systems during off hours... by Saltine+Cracker · · Score: 1

    Generally most Laptops can do this, but I think what the poster is going for is a tool which will hibernate a single process. I think this is a very useful idea.

    For instance, what if your company runs 1 shift, and you're sitting there thinking, now what could I use this IBM zSeries Linux server for at night...how about trying to factor the RSA-2048 number?...but your implementation of the General Number Field Sieve algorythm consumes massive resources, so you want to hibernate the process during business hours and wake it up at night when the boss goes home with out having to start all over. Then 50 years later when the process is finished you'll have your $200,000 prize from RSA.

  70. Re:This CAN be trivially done on any un*x i know.. by zaius · · Score: 2

    So, you mean that the next time my app segfaults and dumps core, I can say it was a feature designed to allow it to be restarted...? Cool. Seriously though, how can you restart a core (obviously not one from a segfault) using gdb?

  71. Volgons? by wiredog · · Score: 3, Offtopic

    The bastard children of Vogons and Vorlons?

    1. Re:Volgons? by crawling_chaos · · Score: 2

      I wonder what their poetry sounds like? At least it would be set to music, I suppose...

      --
      You can only drink 30 or 40 glasses of beer a day, no matter how rich you are.
      -- Colonel Adolphus Busch
    2. Re:Volgons? by Spunk · · Score: 1

      YOU ARE MOSTLY NOT READY.

  72. Use VMware by gmkeegan · · Score: 1

    The vmware window has a freeze/suspend button that will let you freeze the session and resume later. Taking that a step farther, you can even copy the files for that virtual machine to another host, start vmware back up, and execution will resume right where it left off. A number of Linux/BSD/Win os's supported, too.

  73. Also in Win2k and ME by torklugnutz · · Score: 1

    Hibernation is great. Much faster boot up is the end result. C'mon, if MS can implement it smoothly, it must be possible in UNIX/LINUX/BSD. It's invaluable for laptops, somewhat less for desktops, and neglible for servers, except in this guy's situation.

    That said, who's gonna have the foresight to NOT strip this feature out of your own install to conserve server resources? Doh!

    --
    Often in Error, Never in Doubt.
    1. Re:Also in Win2k and ME by Tim+Stadelmann · · Score: 1

      I used to think hibernate was a handy feature when I bought my laptop, but in practice I found it actually takes considerably longer than booting up.

      Maybe the software implementation in Windows is faster in some cases because it only saves memory that is allocated? But then again, why would you need to hibernate a computer if there is hardly anything running on it?

  74. Where are the fascist editors hiding? by Anonymous Coward · · Score: 0

    I'm afraid that this is clearly -1 OFFTOPIC, even if the HHGTTG reference does make you wet your pants with glee. Pull out those mod cannons!

  75. App Specific "Resume" by 4of12 · · Score: 2

    Long ago and far away (about 15 years ago) I recall that TeX was frequently built in a fashion that required running the binary on some "initialization" information. That process took some nontrivial amount of time back in those days (I'm sure now it would be an eyeblink), and the program could be made to \dump its state in some way.

    Then, when you ran TeX in everyday circumstances, the digested initialization file was read in by the application as part of the usual startup process.

    I'm probably botching the explanation of how this really worked, but I guess my point is that the "resume" function had to be coded into the specific application.

    --
    "Provided by the management for your protection."
  76. save partial results and states to a file by Anonymous Coward · · Score: 0

    A student at my lab who needed several days to
    run his simulations got tired of network outages,
    unscheduled reboots and such, wiping out his results. So he redesigned his programs to save partial results and states to a file. If he had to restart his sim, it took up where it last saved.

  77. From the app level, it w(sh)ould be like SETI@HOME by FormerComposer · · Score: 1

    Seti@Home takes about 65 hours per data packet on my machine -- of course with Win98 there are almost daily requested or insisted upon shutdowns. The designers obviously anticipated power-offs (intended or not) and dealt with it. I think the apps that require such runtimes should be designed to deal with such exigencies.

    --
    For most purposes, 355/113 is close enough.
  78. RTMF? by russianspy · · Score: 1

    I don't know what the issue is. *nix can swap processes to disk. It'll save all of the info in a file (just like a core dump). Solaris can suspend everything (it's entire state) and recover that later. I'm pretty sure I've heard my friends talk about the same feature under Linux as well...
    Saving a process (all of it's pages) has been around for a very long time.

    On the other hand. If you have a program that takes days/weeks/months to finish (I do quite frequently) you need periodic checkpoints. There is no way around it. If you're talking about weeks/months - upload those checkpoints to another computer over the net - or burn a CD. The cost of $.50/CD disk is nothing to the loss of a month of computation.

    1. Re:RTMF? by Higher+Authority · · Score: 1

      Don't you mean RTFM?

  79. Windows 2000 and Hibernation by doorbot.com · · Score: 5, Informative
    If you have a Windows 2000 or XP machine you can enable hibernation. However, this is not a "power management" feature... it has been separated from ACPI and/or proprietary disk partitions and will work on all computers, even servers, whether they have ACPI/APM/nothing for power management.

    Once you've enabled it, you create a hibernation file on the C: drive. Hibernation should only take place when there is minimal disk activity (eg, don't hibernate while trying to save your Word document). The system saves the contents on RAM to the hard drive, and then shuts down. When the machine boots, a flag was set (I assume) indicating the system should resume from hibernation... so the hibernation file is read from disk and written to RAM and you're back up and running, in less time than it takes to boot. Plus it keeps your uptime from resetting back to zero.

    Some things to note:

    You will need WHQL certified drivers, or at least properly-written drivers. I have a SB Audigy and the first drivers I used (the ones on the included CD) caused a blue screen on resume from hibernation. When a updated driver was released, it fixed this issue.

    Applications need to be properly-written as well, as there is some sort of Win32 suspend signal that is sent to apps just before the system hibernates, so the app must support this and the resume command when the system is restored.

    Hibernation works great on my laptop and on my workstation, and I especially like the fact that I don't need to create a separate partition or install special drivers to make it work (you can even use it on an NTFS formatted drive).

    1. Re:Windows 2000 and Hibernation by Joe+U · · Score: 2, Funny

      Creative releasing drivers that cause a bluescreen?

      Who would of thought it was possible.

      Rule 1 with hibernation, no creative products.

    2. Re:Windows 2000 and Hibernation by Anonymous Coward · · Score: 0

      Most of the problems I've had with Creative revolve around their complete lack of support for SMP on Windows 2000/XP.

      However, I found the interrupt affinity driver from Microsoft and forced the drivers to use only one cpu... problem solved.

    3. Re:Windows 2000 and Hibernation by Anonymous Coward · · Score: 0

      Who would of thought it was possible.

      Who would HAVE thought it was possible.

    4. Re:Windows 2000 and Hibernation by dublin · · Score: 3, Interesting

      This is not strictly speaking a W2K function. The real kicker here for Linux folks is that the easiest way to do hibernation in the modern world is to use ACPI, which Linux doesn't do very well. (See this week's LWN for a timely discussion.

      APM BIOSes can also do this, but they aren't as standard: Often the implementation details are specific to the hardware. For instance, Phoenix BIOSes (at least as of two years ago, I haven't messed with this stuff much since then) tend to want to put the STD (suspend-to-disk) data in a special file in a Windows partition, while some others (Dell for sure, since I used to work this stuff for them) save this info in a special STD partition (type 84, IIRC) which is a more generic solution, but requires more knowledge when setting up the box. (When was the last time you thought you might need an STD partition when building your box? BTW, they should be at a minimum, PhysicalMemorySize + 1 MB for state info, video register settings, etc.)

      --
      "The future's good and the present is nothing to sneeze at." - Roblimo's last ./ post
    5. Re:Windows 2000 and Hibernation by doorbot.com · · Score: 3, Interesting

      This is not strictly speaking a W2K function.

      Agreed, and as you go on to explain, and I believe I alluded to in my post, there are many proprietary implementations via the BIOS or DOS drivers, etc.

      My point was that Windows 2000 separates the hibernation feature from the BIOS. As far as the BIOS can tell, the system is booting normally... but once the BIOS loads the NTLDR, Windows takes over of course and handles the hibernation. This is why it works so well and does not have all of the "stupid issues" such as custom drivers, partitions, or the like. The end result is not a MS-only function, but the implementation is, as far as I can tell.

    6. Re:Windows 2000 and Hibernation by denzo · · Score: 3, Interesting
      Not according to Microsoft (on their knowledgebase). This article states that Win2k needs ACPI to support OS hibernation, and that the BIOS has to support it. Although Microsoft has been known to contradict itself.

      And simply having a WHQL-certified drivers doesn't necessarily mean it'll work. I had a Future Domain SCSI controller in my computer that loaded with the default Win2k WHQL driver, but I could never hibernate it. When I swapped it out with an Adaptec 2940UW, I was able to enable Hibernation in my Control Panel settings.

    7. Re:Windows 2000 and Hibernation by EvlG · · Score: 2

      The need for 100% kosher drivers and apps is the real kicker here.

      Lots and lots and lots of people don't have great (or even good) drivers for some hardware.

      Apps suck even more - the whole Windows platform is full of the people doing X, Y, and Z in different ways to skirt different OS bugs or other pet peeves they didn't want to deal with.

      I've never gotten Hibernate to work properly for just those reasons - apps and drivers on windows suck.

    8. Re:Windows 2000 and Hibernation by Anonymous Coward · · Score: 0

      Thanks for that contribution. You're my hero.

    9. Re:Windows 2000 and Hibernation by cb0y · · Score: 0

      I always hated how hibernation needed the same disk space, ie 512meg ram, needed a 512meg file, this is completely lame.

      Why cant it
      1) save only "USED" ram, and mark unused with 0000000000s
      2) compress gzip it all at the same time.

    10. Re:Windows 2000 and Hibernation by Peyna · · Score: 1
      If drivers on windows suck, name me an operating system with good support for all the hardware out there today. At least Windows recognizes every piece of hardware on my computer (internal and external) with no problem, and I have not had any problems at all with any of my hardware.

      As far as redhat or other linux's go, it wasn't until just recently that my USB mouse would even work, and it randomly stops working as it is. If I boot up with a disk in my zip drive, it complains. Redhat 7.2 was the first to fully recognize my video card, etc, etc, etc.

      One more complaint, why can't X figure out for itself what my monitor's specs are? Windows sure doesn't have any problems....

      --
      What?
    11. Re:Windows 2000 and Hibernation by Anonymous Coward · · Score: 0

      MS can't include gzip without making the source available. Integrating it into W2k would be a big no-no.

    12. Re:Windows 2000 and Hibernation by deltavivis · · Score: 1

      I've never had a problem once using hibernate on win2k, i'll hibernate out my laptop everynight for a month or two and never actually reboot.

      The funniest thing i ever noticed was when i hibernated while a CD was playing in winamp. When i restarted the CD started playing before i logged in.

    13. Re:Windows 2000 and Hibernation by dublin · · Score: 2

      FWIW, I don't think this is Windows-only. Hibernation should work in any OS that understands how the APM or ACPI BIOs APIs work. Sadly, the only Linux I've found that even comes close to understanding these correctly is Corel. (They actually did a great deal of the "hard stuff" right - I hated to see them fade away before making their mark...)

      (FWIW, I prefer to use the terms "suspend-to-RAM" (S2R or STR) and suspend-to-disk (S2D or STD), since there's no ambiguity about what's going on that way, as there can be with terms like sleep, suspend, snooze, and hibernate.)

      --
      "The future's good and the present is nothing to sneeze at." - Roblimo's last ./ post
    14. Re:Windows 2000 and Hibernation by Joe+U · · Score: 1

      Sorry, should have read

      Who woudda thought it was possible.

      Emphasis on the NY accent.

  80. Process-saving is known, but not what you want by Seth+Finkelstein · · Score: 4, Informative
    The idea of saving the state of a process is very well-known. Take a look at anything from emacs dumping to the gcore(1) program. It's been used in everything from saved games of Rogue to saved states of PERL.

    But isn't it overkill for a data-crunching operation? As many other people have noted, it would seem you're much better off checkpointing your data to disk, rather than relying on low-level OS process wizardry.

    Sig: What Happened To The Censorware Project (censorware.org)

    1. Re:Process-saving is known, but not what you want by bfields · · Score: 2
      But isn't it overkill for a data-crunching operation? As many other people have noted, it would seem you're much better off checkpointing your data to disk, rather than relying on low-level OS process wizardry.

      Perhaps the process is running software he didn't write, in which case this might not be so easy.---Bruce F.

  81. Emacs is the answer of course by Anonymous Coward · · Score: 0

    Actually Emacs has been doing this for a long time, believe it or not.

    If you've ever built Emacs, you'll see that at one point the makefile runs emacs. Well, it runs an executable that shows a print out GNU Emacs, (c) etc., loads a whole bunch of things and then ... core dumps!

    It seems that Emacs does a lot of stuff when loading, meaning it would take forever to run. to get around this the developers decided to have it load partially during build, setting up everything once and then do checkpointing. They dump the state of the app (which is what a core dump does) and then build an executable that just loads the core dump and continues it. That's why Emacs loads so ... quickly, I guess.

    I don't know if you can do a similar thing to other programs via GDB, but I'm sure it is possible to build this things into each program.

  82. Already available for Linux by HishamMuhammad · · Score: 2, Interesting

    There is a kernel patch to do this. It's called Software Suspend. It is also part of the FOLK project (Functionality Overloaded Linux Kernel, a project to merge the largest possible amount of patches into the kernel).

  83. Bad coding? by oolon · · Score: 2

    Surely if this process takes so long to execute the person who wrote it should have made it save its state every once in a while. Problems like these can have been avoided! Setiathome to name but one does exactly this.

    James

  84. You sure of that? by bastion_xx · · Score: 4, Funny

    My Intel processor puts it somewhere around 41.99999999967

    1. Re:You sure of that? by Anonymous Coward · · Score: 0

      No, you're wrong. My AMD processor says it's... oops, the heatsink fell off and the CPU fried itself.

    2. Re:You sure of that? by Anonymous Coward · · Score: 0

      I'd tell you what my Cyrix CPU said, but it was sold during the bankruptcy liquidation.

    3. Re:You sure of that? by Sj0 · · Score: 2

      No, you're wrong. My AMD processor says it's... oops, the heatsink fell off and the CPU fried itself.

      Sorry, that's AMDs fault right there. They shouldn't be letting retards with no motor skills try to put together a modern PC. BAD AMD! BAD!

      People should really lighten up -- I suppose you'd be blaming AMD if a power surge caused your power supply to send 1000V DC into your board, destroying all the components? Electronics are designed to be used in a certain way(ie. a CPU which runs at such a high speed, and does more computations than the entire computing world 25 years ago runs hot, and was designed to run with a heat sync), and if someone isn't functional enough(alternate:Is mentally retarded or is suffering from parkinsons disease) to ensure that their heatsync doesn't fall off, maybe they shouldn't be allowed around complex electronics.

      Oops! My heatsync fell off and shorted out my video card, destroying it. BAD NVIDIA! BAD!

      --
      It's been a long time.
    4. Re:You sure of that? by Anonymous Coward · · Score: 0

      I'd tell you what my Itanium said, but it's still crunching on the result after 42 hours...

  85. Software Suspend for Linux by ernte23 · · Score: 1

    http://falcon.sch.bme.hu/~seasons/linux/swsusp.htm l

  86. Re:This CAN be trivially done on any un*x i know.. by xyzzy · · Score: 3, Informative

    You can't. The previous poster was making it sound too easy. Real checkpointing needs to save Kernel state as well -- file handles, device driver state, you name it. It isn't as simple as saving the in-memory image of the process.

  87. Cryogenic freeze / Hibernation by Dr_Marvin_Monroe · · Score: 2, Interesting

    I think that this might also be a really good bug fix/hacking tool. I can also remember something like this for the Apple II in years gone by. You could press a button and take a snapshot of all memory in the system. Then you could write the executable part to disk and pick up where you left off. Good for freezing a copy of a game or whatever.

    This would also be good for tracking down bugs using the "before and after" technique.

    Such a program could be tied into the UPS monitor in such a way as to save everything that couldn't be stopped.

  88. CDC Cyber 205 by epepke · · Score: 5, Interesting

    As usual, this is ancient. Back at FSU, we had a CDC Cyber 205, a vector pipeline supercomputer, back in 1985. Any process could be crashed for a shutdown, and it produced a file that worked exactly like an executable and resumed computation from the time it was crashed.

    1. Re:CDC Cyber 205 by Anonymous Coward · · Score: 0
      Why is this modded as offtopic? It's interesting and related to the subject..

      Uncover the Slashdot Moderation Conspiracy!

  89. Persistant Operating Systems by avdi · · Score: 1
    A number of projects have worked on making "persistent" operating systems which could save their entire state across powerdowns. The TUNES project is one that comes to mind. There are other projects that are more concrete and farther along. A quick google search turns up this page, among many others.

    Actually, if you want to play with a persistent programming environment, download a Smalltalk environment. Smalltalk environments are able to serialize themselves to image files. When subsequently re-serialized, the state of all the objects in the system at the time of serialization is restored.

    --

    --
    CPAN rules. - Guido van Rossum
    1. Re:Persistant Operating Systems by Anonymous Coward · · Score: 0

      Another way to make an operating system running on bix boxen persistant is to sumply purchase a 4+ kilowatt diesel generator that cuts in when power goes out... :)

    2. Re:Persistant Operating Systems by Anonymous Coward · · Score: 0

      sigh.... s/bix/big iron unix/g

  90. How hard could this be to experiment with? by Nelson · · Score: 5, Interesting
    I've thought about this for booting issues. I have a server that's all journaled and everything and it's periodically get's bumped. Boot time is still on the order of 2 to 4 minutes for a full Linux server install. With my current stats that means I'm probably going to miss a hit or two on one of the web pages, all things being equal. A good portion of that is just icing though, things that are there "just in case" or get used infrequently. (Okay, I can screw with the init order and the problem essentially goes away or I can switch hardware but we're nerds and geeks so let's just explore this)


    I was thinking about this and here was my dirty hacky idea. You need kexec, lobos, or something similar (actually a fairly modified version of it) you'll need on the order of 8MB of disk space and some kernel mods, which might not be that extensive.


    I was thinking we develop some driver or process that consumes all of the memory and CPU in a system. It forces all of the processes to swap out, it would probably need to be a driver of sorts on current linux systems. Then it could dump the kcore out to a file somewhere, sync it, and hibernate. Then when the kernel boots up, if the right arg is passed in it could either load this image back in to ram in place of the kernel and then jump into it (easier said than done) early in the boot (page tables are made long before you have access to the drives and such so the logistics of this would need to be figured out) or it could boot up and use a different swapper partition and then have some kind of tool like kexec to load that image back in to ram and start it up. Or something, some how you should be able to recover the state of the system. File handles and everything would be there.


    The harder part would be hardware and network transparency. You'd need to modify all of your drivers to make sure that the hardware could be reset and they could deal with it. I think it's a little easier for the network side because it would be similar to simply unplugging the network cable, you have open sockets that are talking to nothing and some software can deal with that pretty well. There is also some kind of system integrity or robustness piece that is needed, if the system some how changes when you bring your old image back it could break things, munge files, etc..

    1. Re:How hard could this be to experiment with? by jmy · · Score: 1

      And how would the hardware be initialized? You can't just wave a magic wand and have NIC:s and soundcards just work...

      You would basically have to run the boot process as usual to the point where init starts. Sure you could probably save some time by restoring processes started by init your way, but then again, those processes could very well have messed with hardware and things you can't possibly restore before they were swapped out.

      I REALLY don't think you'd wan't to go there, there are just too many problems...

  91. doesnt SETI@home do this, sorta? by Pharmboy · · Score: 3, Informative
    seti@home kinda does it.

    the seti@home client uses its *.sah files to save the state of a calculation. of course, this is program dependent, not OS dependent. I guess if you have the source files for the program doing the counting.....

    --
    Tequila: It's not just for breakfast anymore!
  92. STANDALONE CONDOR CHECKPOINTING by Anonymous Coward · · Score: 5, Informative

    STANDALONE CONDOR CHECKPOINTING:

    Using the Condor checkpoint library without the remote system call functionality and outside of the Condor system is known as
    "standalone" mode checkpointing.

    To link in standalone mode, follow the instructions for linking Condor executables, but replace condor_syscall_lib.a with libckpt.a. If you
    have installed Condor version 5.62 or above, you can easily link your program for standalone checkpointing using the condor_compile
    utility with the little-known "-condor_standalone" option. For example:

    condor_compile -condor_standalone [options/files....]

    where is any of cc, f77, gcc, g++, ld, etc. Just enter "condor_compile" by itself to see a usage summary, and/or refer to
    the condor_compile man page for additional information.

    Once your program is relinked with the Condor standalone-checkpointing library (libckpt.a), your program will sport two new command
    line arguments: "_condor_ckpt " and "_condor_restart ".

    If the command line looks like:

    exec_name -_condor_ckpt ..

    then we set up to checkpoint to the given file name.

    If the command line looks like:

    exec_name -_condor_restart ...

    then we effect a restart from the given file name.

    Any Condor command line options are removed from the head of the command line before main() is called. If we aren't given
    instructions on the command line, by default we assume we are an original invocation, and that we should write any checkpoints to the
    name by which we were invoked with a "ckpt" extension.

    To cause a program to checkpoint and exit, send it a SIGTSTP signal. For example, in C you would add the following line to your code:

    kill( getpid(), SIGTSTP );

    Note that most Unix shells are configured to send a TSTP signal to the foreground process when the user enters a Ctrl-Z. To cause a
    program to write a periodic checkpoint (i.e., checkpoint and continue running), sent it a SIGUSR2:

    kill( getpid(), SIGUSR2 );

    In addition to the command-line parameters interface described above, a C interface is also provided for restarting a program from a
    checkpoint file. The prototypes are:

    void init_image_with_file_name( char *ckpt_name );

    void init_image_with_file_descriptor( int fd );

    void restart( );

    The init_image_with_file_name() and init_image_with_file_descriptor() functions are used to specify the location of the checkpoint file.
    Only one of the two must be used. The restart() function causes the process image from the specified file to be read and restored.

  93. Search in the slashdot archives for kernel patches by Alan · · Score: 5, Informative

    I think it was somewhere in the list of patches from the -mjc tree (see here) that there was a patch for the entire kernel for linux. Basically it let the system save it's state, and then restore it if it detects that it was shut down at that point. I'm not sure if this is what you want (and I couldn't get it working), but it's certainly a step in the right direction to what you're looking for.

    Just found it here, it's the 'swsusp' patch.

  94. Re:This CAN be trivially done on any un*x i know.. by ethereal · · Score: 1

    I dunno about GDB, but you can do this on command with the "abort" call and the "undump" command. While in your program, call abort(). Run undump on the core file to get an executable. When you run the executable it starts exactly where it left off at the abort().

    details here

    Woops, after reading that it sounds like it starts off at the top of main() again. But, if you had a flag to indicate where you'd aborted from, you could jump to that immediately and resume operations.

    It's a cool little trick; unfortunately I've not yet gotten to use it for anything :)

    --

    Your right to not believe: Americans United for Separation of Church and

  95. Java has lightweight persistence... by bernz · · Score: 2, Interesting

    If you utilize the java.io.serialization stuff right, you can create a lightweight persistence and should be able to freeze and resume processes on the same application if you handle threading right with it.

  96. the problem with sleep processes... by Triv · · Score: 1

    ...is that way too often they don't wake up when you want them to. I've seen this happen on macs as well as compaq boxes. It's an annoyance when a reboot is required - it's even more annoying when you're in the midst of a huge calculation/rendering job and there's a power problem. Hypothetically, You freeze/sleep/whatever your system. Once the crisis has been averted you wipe the sweat from your brow and breathe a sigh of relief, hit the little pulsating moon button and watch as your computer does...nothing.

    "The answer to life, the universe and everything is...is...[zzzzzzt]"

    "We're going to get lynched, do you know that?"

    --Triv

  97. Doesn't matter... by Anonymous Coward · · Score: 2, Funny

    The answer would have been 42 once the processing was complete. So who cares? Get a bigger UPS :-)

  98. Darwin/MacOS X by Duck_Taffy · · Score: 4, Informative

    Here's a mutation of FreeBSD that can do exactly that. I've put my laptop to sleep in the middle of installing software while running MacOS X and brought it back up several hours later to resume installation with no problems. The same function works on my G4 tower. Yes, it does drop network connections. However, it does use a trickle charge to power the LED's and presumably to keep the processor alive, and possibly some memory. Paging several hundred megabytes in a couple of seconds would be quite the task! One item of note is that all Apple machines have a special piece of hardware known as the PMU (Power Management Unit). In the desktops, it's parted out onto the mother board and into the power supply, but in the laptops it's a seperate card which controls both sleep and the charging of the battery. Perhaps other UNIX machines would need a similar device for this function to work properly.

    --
    Karma: Ran over your dogma.
  99. Yeah, CDC's NOS/BE could do this 25 years ago by theosch · · Score: 1

    If memory serves me right. It even was called 'checkpointing' already .Although - I never used this feature.

    NOS/BE = Operating System of CDC (Control Data Corporation) for their CDC6600 and Cyber systems.

    Ah, those were the days...

    1. Re:Yeah, CDC's NOS/BE could do this 25 years ago by GigsVT · · Score: 1

      So... the question is... Why have we already lost so much computing wisdom?

      Why are software techniques shit today compared to yesterday?

      Why are several of my C64 games better than modern games in terms of playability and even in music and sound in some cases?

      Are we losing what we once had, so quickly?

      --
      I've had enough abrasive sigs. Kittens are cute and fuzzy.
    2. Re:Yeah, CDC's NOS/BE could do this 25 years ago by Anonymous Coward · · Score: 0

      I blame Microsoft.

    3. Re:Yeah, CDC's NOS/BE could do this 25 years ago by swb · · Score: 3, Insightful

      Why are software techniques shit today compared to yesterday?

      Because we're hopeless caught up in trying to reinvent a somewhat limited computing paradigm (unix). No one, except for some CompSci projects that never really go anywhere, have any real interest in making a new operating system that builds on the lessons of all the previous operating systems and includes reasonable features like process checkpointing/suspension.

      I'd bet there are patent considertions as well -- maybe many of the good OS features are not reproducable due to existing patents.

    4. Re:Yeah, CDC's NOS/BE could do this 25 years ago by LatJoor · · Score: 1

      Why are several of my C64 games better than modern games in terms of playability and even in music and sound in some cases?

      I believe that there are two reasons for this.

      1. You're used to the old games and the nostalgia makes you enjoy them more.

      2. There were LOTS of terrible games on old platforms. You don't see them anymore because they've all fallen by the wayside. Meanwhile, all of today's blunders are still around in plain view.

      I don't think that it has anything to do with old games being inherently better. In some ways today's games are far better. For example, in terms of artwork modern games are far superior, both because graphics capabilities are better and because professional artists are involved in their design.

    5. Re:Yeah, CDC's NOS/BE could do this 25 years ago by Anonymous Coward · · Score: 0

      Sure, they had a big part in it.

  100. You could do this with VMware by IGnatius+T+Foobar · · Score: 1, Redundant

    You could do this with VMware. Run another copy of Linux inside a VM, and suspend the VM when you need to shut the box down for a while. Very simple.

    This is not the most efficient way to use a computer, of course -- you'd probably want to dedicate your resources to the application instead of to a virtual machine environment -- but technically this does get the job done.

    --
    Tired of FB/Google censorship? Visit UNCENSORED!
  101. problematic by S.+Allen · · Score: 2

    Easier said than done. If this wasn't part of the application's design or if it's relatively sophisticated, making these changes can be non-trival. And (shock/horror) if you don't have the source code, it's impossible without OS assistance.

    1. Re:problematic by cweber · · Score: 1

      Yeah, maybe it easier said than done, but if you CAN do it at all it is going to be faster, more reliable and more elegant than OS-supplied checkpoint/restart.

      Given source code, you most likely know exactly what your app needs to save at any given moment to restart. The OS however knows nothing and needs to save EVERYTHING remotely raletd to your app.

      I have experience with checkpointing under SGI IRIX, and while it is nice to have, it sucks bigtime compared to those apps where builtin checkpoint/restart is available.

    2. Re:problematic by dakoda · · Score: 1

      hmm, thats interesting. im not too familiar with irix, but some of its features just seem so cool =) what about implimenting it as an extention of signals, in that a process can get SIGHBRN8, and if it's not handled by the app explicitly (ala sigaction()), the os can just do a quick and dirty save-everything method. that gets the best of both worlds: every app _can_ hibernate, and specilaized ones can do it better, and not have the inconsistencies of implimenting both independantly (os hibernating a process that is preparing to hibernate w/o os hooks etc)

    3. Re:problematic by bbqdeath · · Score: 1

      That's good to solve a specific problem, and I appreciate that tactic. But for my own personal stuff, I prefer checkpointing based on the assumption that at any time the power or even critical hardware can fail without warning. It's overkill in terms of the scheduling, because technically I only _need_ to save right before the app encounters a fatal exception, the power blinks out, or the motherboard smokes, but the code to do the checkpointing is usually related to test code I write anyway to work with serialized test data when I'm developing the apps, so it's often relatively simple to add as I go if I remember.

    4. Re:problematic by cheezehead · · Score: 1

      Yes, obviously you need the source.

      On a related matter: saving state doesn't help if your hard disk crashes, or something else goes seriously wrong with your hardware (as opposed to a power outage). How about saving the application state on a different machine, i.e., uploading it to some remote server or whatever? Disaster recovery is then possible by downloading the state to a different machine (of course you'll need a backup of your application as well) and resuming the calculation.

      --

      MSN 8: Now Microsoft even has bugs in their ad campaigns.

  102. No, suspend by Shimmer · · Score: 1

    That sounds more like "suspend" than "hibernation". When you hibernate a Windows box, it writes its entire RAM image to disk and turns off. When you turn the box back on again, it actually has to boot the image back into RAM.

    In your situation, some power is still necessary to maintain the RAM while the lid is closed.

    -- Brian

    --
    The most rabid believers in American Exceptionalism are the exact same people whose policies are destroying it.
    1. Re:No, suspend by Proteus+Child · · Score: 1
      That sounds more like "suspend" than "hibernation"...

      Aah.. my glitch. Thanks for the correction.

      --

      Proteus' Child

      Doko ni datte; hito wa, tsunagette iru.

  103. UNICOS by trandles · · Score: 1

    UNICOS has been doing system checkpoints for years. Checkpoint the system, shut it off, turn it on, restart from checkpoint, everything is exactly as it was when originally checkpointed.

  104. Interleaf did this for a quick start by emptybody · · Score: 1

    you could save a quick start file which was a snapshot of the running program. To start up again later, it would just read that snapshot and push the whole bloody thing into RAM. Once completed you were running.

    it was the difference between a 10 minute startup or a 1 minute startup.

    --
    comment directly in my journal
  105. Microsoft by volpe · · Score: 0, Offtopic


    Why can't I freeze down the process and thaw it back up at a later time? It ought to be possible to take all the connected memory pages and save them in some way, preserve file handles and pointers, and everything. Maybe net-connections would die, but that's understandable. Has any work been done in this field?


    Yes, in Redmond. It's called "Hibernate Mode", and it's been around for a while now. If the truth hurts, go ahead and mod me down. My karma's capped at 50 right now anyway.

  106. Solid-state memory by kenneth_martens · · Score: 2

    I think this problem is more easily solved in hardware than in software. With recent advances in solid-state memory, hopefully a standard can be worked out so that solid-state memory can replace or complement volatile memory (i.e., RAM as we know it.) Solid-state memory could would survive a power outage, and you could pick up where you left off.

    The disadvantages are speed (solid-state memory is getting faster all the time, but it is still slower than volatile RAM), cost, and lack of current standardized implementations (I'm not even sure there are any working implementations.)

    For some background research in solid-state memory, check out this site (it's a bit old, but still interesting.

    1. Re:Solid-state memory by Anonymous Coward · · Score: 0
      Solid-state memory! Yes, that's much better than the vacuum-tube based memory I used to use.

      Or perhaps you meant static RAM?

      (for those not in the know, "solid-state" traditionally refers to anything transistor-based. Also, ironically enough, good old magnetic core memory was static, so people used to take it for granted that of course you could power down and restart where you left off later easily).

  107. There are some problems by Guignol · · Score: 1

    It depends on what kind of calculations you are doing,
    But if they are time-sensitive, then besides all the almost purely hadware solutions so far proposed,
    You'd also need to have os system calls that give you fake date/time related answers. (to be different of course of the 'real ones' also available)
    If you want to have per process "cryogenics" you'll also have to keep track of the different date/time status for each one of them, whoich could cause trouble if they comunicate now and then...

  108. Not quite by spacefrog · · Score: 1

    Not quite.

    I know of no laptop manufacturer that calls this process "hibernation". Every laptop manufacturer I know of calls this "suspend to disk" or something similar.

    The only product that I know of that calls this process "hibernation" is Windows 2000.

    Windows 2000 implements hibernation at the *OS* level. It has nothing to do with your BIOS.

    Make your standard slashbot comments about W2K, but this is a feature they got right. Since it sends the same 'ol suspend/unsuspend messages to processes, most of your apps will even reestablish their network connections without any fuss.

    I have a fairly loud multi-processor system and the misfortune of a combined bedroom/home office so hibernation is a real lifesaver for me if I want to get any sleep.

    1. Re:Not quite by Anonymous Coward · · Score: 0

      My Sony VAIO 505G calls it hibernation.

    2. Re:Not quite by Anonymous Coward · · Score: 0

      Solaris / SPARC also implements this feature entirely in the OS, though it's called system "suspend" [to disk]. I've been using this feature for years and can't remember the last time I actually had to log into my workstation or restart X at home-- it's sometimes most of a year before I have to, usually because of a power outage.

    3. Re:Not quite by roju · · Score: 1
      Make your standard slashbot comments about W2K, but this is a feature they got right.

      I really have to agree with you on this one. I've bound my power button to hibernate, so I can just punch that, and my system goes down without messing anything up.

      most of your apps will even reestablish their network connections without any fuss

      The only app that has a problem with it in my experience is ICQ. The app itself runs ok, but it doesn't realize that it lost its connection, and so it looks like you're online, but you're really not. It's actually caused me to miss important messages and such. Oh well, it's easily fixed by disconnecting and reconnecting.

  109. Hp MPE/3000 expert please give us details by masters · · Score: 1

    Would someone please give the community some insights on how this worked on the HP 3000 or Hp MPE. My understanding is that everything in that operating system was a transaction. When powering on, the system would roll back to the last commited transaction and just start right back where it left off.

    With this system, the process would just start from where it left off.

    A description is in this paper MPE/iX Transaction Manager

  110. Re:Great for high end biz systems during off hours by Anonymous Coward · · Score: 0

    hum...
    maybe ctrl-z?

  111. hmm by Hadlock · · Score: 1

    speaking from a mac's POV, i'm running mandrake 8.1 under emulation on Virtual PC 5...install under emulation takes a while, so i split it up into 3 days. You can save the entire PC's state, copy it, and run it on another computer and boot back up under that same instance. Possibly you could run a "virtual linux server" that people have been talking about in recent mainframe posts....not 100 or the likes, just one, which i would guess wouldn't be too difficult. once that works, you might be able to save the "virtual server"'s state. shrug

    which brings another thought: could you distribute an ISO of linux that was in a saved state, you just put in the CD, turn on the computer, and go. you could limit it to accessing 128 megs of ram, using a NE2000 compat. network card w/dhcp, and a standard vesa2.0 video driver. just boot up and go, no install or partitioning. write to ram as a ramdisk, but you'd lose everything when you shutdown; not bad as a dumb terminal.

    it would fall somewhat along the same principles of making a PC a game console, just stick in the disk, turn it on, and go-all the basic hardware works. sound might be an issue, I don't know of any sound standards other than soundblaster 16 support.

    --
    moox. for a new generation.
  112. It is possible...but it could be messy... by Mysticalfruit · · Score: 3, Interesting

    What if the process has forked off a bunch of children? Are you going to archive all the children at the same time? What if the process has a whole bunch of files in /tmp, are you going to roll them up into the freeze state as well? What if your using pthreads? Are you going to keep the state for each thread? How about file pointers?

    I think the better solution is to write a new signal called "SIGFREEZE" and have programs just write code that could handle such an event. Let the program figure out how to save their own stuff.

    A good example would be a program that was calculating pi. The programmer would have to implient a signal handler that would when it recieved a SIGFREEZE would stop its computating and write what its currently working on out to file. The other thing the programmer should be doing is periodically writing their data out to a file anyway. Then the programmer should have implement a command line option that would facilitate reloading from a saved state.

    Thats my take on it...

    If you see any problems with it... bring it on.

    --
    Yes Francis, the world has gone crazy.
    1. Re:It is possible...but it could be messy... by ameoba · · Score: 2

      Uhh... Why bother with a new signal when you can just write the program to save a checkpoint when recieving one of the normal ones? It's not like handling signals is -that- hard.

      --
      my sig's at the bottom of the page.
    2. Re:It is possible...but it could be messy... by Anonymous Coward · · Score: 1, Informative
      Well, what you describe is exactly what a decent program should do when it receives SIGHUP ("hangup").

      Try it with vi sometime. Start editing some file. Then, from another window, kill -HUP it and watch it checkpoint everything. "vi -r" recovers.

      The whole point of SIGHUP was to give your programs a chance to do something reasonable when your modem (or whatever) connection drops for whatever reason.

    3. Re:It is possible...but it could be messy... by Anonymous Coward · · Score: 0

      I like Linux. It is the best. I think that everybody should use Linux.

    4. Re:It is possible...but it could be messy... by scrytch · · Score: 2

      > I think the better solution is to write a new signal called "SIGFREEZE"

      Which is not only how Solaris does it, it's what Solaris calls it. The counterpart signal is SIGTHAW. The signals are advisory though, the process isn't required to implement all the freeze/thaw logic in userspace.

      --
      I've finally had it: until slashdot gets article moderation, I am not coming back.
  113. Change X DISPLAY on the fly?? by Guiri · · Score: 1

    Is there any way to stop an X application, and restart it to a different display?? (i.e. other machine)

    1. Re:Change X DISPLAY on the fly?? by mrtensson · · Score: 1

      xmove, if you ment not stopping the application..

  114. CORE! by SJS · · Score: 1

    If we would have just stuck with core memory, we wouldn't be having these problems!

    --
    Pick One: http://www-rohan.sdsu.edu/~stremler/sigs/sigs.html (Note - disable Javascript first!)
  115. Cryogenic between two machines? by bigjocker · · Score: 1

    Couldn't it be also possible to hybernate a process, serialize it and send it to another machine with the same architecture to be executed there?

    This maybe sounds too crazy, but it should be possible with well designed systems (aka Linux).

    --
    Life isn't like a box of chocolates. It's more like a jar of jalapenos. What you do today, might burn your ass tomorrow.
  116. IRIX does it by Cyrano+de+Maniac · · Score: 1

    IRIX has the capability to checkpoint and restart just as the original poster is asking for. It can successfully checkpoint and restart very complicated jobs, not just the simple programs that some of the posters have indicated.

    There are a number of items which cannot be automatically checkpointed (i.e. open sockets). However, through the use of signals, any application written to cooperate with IRIX's checkpoint/restart will be given an opportunity to gracefully save the portion of its state that the kernel cannot automatically handle.

    This is one of those capabilities of big mature UNIXes that is still awaiting implementation on most open-source UNIXes.

    --
    Cyrano de Maniac
  117. Ummm... by xanadu-xtroot.com · · Score: 1

    Windows has supported HIBERNATE for a couple YEARS now. Toshiba was doing it with their laptops even a couple years before that. What's the problem.

    Hell, not even a problem:

    What's the question? This has been done... AGES ago.

    --
    I'm not a prophet or a stone-age man,
    I'm just a mortal with potential of a super man.
  118. Freezing a UNIX box by herwin · · Score: 1

    Actually, I do it all the time on my UNIX laptop (Macintosh PowerBook G4 running MacOS X). I also run a Windows 98 box in emulation (VPC 5.0.2) pretty constantly on the G4, so I have two options: run the process on the emulated box and be able to recover after a shutdown, or run it on the PowerBook and use sleep.

  119. But it gains you nothing by DunbarTheInept · · Score: 2
    But VMware is typically running things twice as slow as native, so you gain nothing at all by running the project under vmware. Consider: Without a way to checkpoint the program, what happens if you have to start over near the end of the run because you had to kill it? You end up taking twice as long overall - the first aborted run plus the full run time again from scratch a second time. So in the worst case scenario, where the program is killed *just as* it was about to finish, you get performance as bad as running under VMware without a crash.

    It only is worth it if you expect to have to halt the program more than once. Assuming only one halt and restart, VMware is still slower.

    --

    Don't label something "offtopic" unless you know the topic well enough to tell what's on topic.

  120. File Descriptors are per-process by parc · · Score: 3, Informative

    A file descriptor is a per-process entity. Yes, there's a big table of file descriptors that exists for the entire sstem, but file descriptor 5 for process a is not file descriptor 5 for process b. Not even if they point to the same file/pipe. A case in point is FD 0, aka stdin. Every process starts out with a stdin on FD 0.

    More important is how do you tell the kernel what file descriptor 5 pointed to? What if the file/pipe doesn't exist any more?

    1. Re:File Descriptors are per-process by Anonymous Coward · · Score: 0

      That's right - same deal for open socket connections.

    2. Re:File Descriptors are per-process by jelle · · Score: 2, Insightful

      Just return an error message. The application has to be able to deal with lost connections anyway.

      Note that you can SIGSTOP a process, then it will be on hold, may even become completely swapped out. Then you can SIGCONT the same process to let it run again.

      So you could send it a SIGSTOP and force it to swapout. That is just checkpointing until the next reboot... Of course you need more info to restore the process from the swap when the system reboots, but it's a start as to how to implement checkpointing.

      I'm sure there is more than one road to Rome.

      --
      --- Hindsight is 20/20, but walking backwards is not the answer.
    3. Re:File Descriptors are per-process by parc · · Score: 1

      Actually, the entire process will not be swapped out. The in-kernel descriptor table will remain in memory. Otherwise you couldn't deal with locking issues, etc.

  121. Buy a generator. by Super+Gimpy · · Score: 0

    Why is it every time there is a hardware problem you guys look for a software solution.

    Programmers.

    Drop $10,000.00 on a portable generator and the necessary wiring.

    Problem solved, with 0 lines of code.

  122. Solaris does this. by Anonymous Coward · · Score: 0

    Maintenance Commands sys-suspend(1M)

    NAME
    sys-suspend - Suspend or shutdown the system and power off

    SYNOPSIS
    /usr/openwin/bin/sys-suspend [ -fnxh ] [ -d ]

    AVAILABILITY
    SUNWpmowu

    DESCRIPTION
    sys-suspend(1M) provides options to suspend or shutdown the
    whole system.

    A system may be suspended to conserve power or to prepare
    the system for transport. The suspend should not be used
    when performing any hardware reconfiguration or replacement.

  123. this is available on some unices by Hawks · · Score: 1
    find your self a nice solaris box and do:



    man cpr


    man powerd


    man power.conf


    Sometimes there are advantages of commercial OSs


    Hawks

    --
    in anima Apparatus
  124. Suspending your application code might be simpler by Aging_Newbie · · Score: 1

    Suspending an entire computer system at any given point is, in principle, possible given that you have thought through how to preserve the entire system state through a reboot and then restore it. Of course, you may also have to suspend and preserve data on other systems too, if you are depending on them. Laptops and Windows can do it fairly reliably for some applications. I think laptops work by getting the applications and OS into a safe and simpler state and then saving that state. I suspect they cannot save any arbitrary application you could write - just the applications they routinely run.

    Easier, however, would be to design your application such that it records its state maybe every hour or so. It could write pointers to incoming data, output data, and other important values to a log file. Given that smaller set of information you can resume the application at the last saved state and continue.

    Doing that can present a challenge to the design in many cases but I think it would often return your effort when you can stop the machine, debug your code, and continue from the last saved state. You don't have to restart from the beginning all the time...

    Above statements are IMHO and your mileage may vary.

  125. Re-crashing problem by DunbarTheInept · · Score: 2

    My concern with that is this: Let's say something buggy is making the system crash. Then if the persistant OS does it's job with perfect accuracy, it's just going to end up re-creating the conditions that caused the crash, and Boom - crash again. The only way to avoid this is to NOT succeed at the goal of re-creating the conditions before the crash.

    --

    Don't label something "offtopic" unless you know the topic well enough to tell what's on topic.

  126. Solaris / IRIX, etc. by Anonymous Coward · · Score: 0

    It's quite do-able and has been done a few times. Solaris could do this for the whole machine, SGI IRIX unixen could do this per process, and Cray's had this as a key feature a long long time ago. For the first two, the guy who wrote the code used to sit next to me.

  127. VMWare isn't a solution to a cpu bound process by brer_rabbit · · Score: 2

    While I love VMWare, it does consume a substantial amount of CPU/memory. The problem is a job like what the original poster described is usually CPU or IO bound, and VMWare just starves the process from what it needs even more.

    Granted, it is a solution, but your job that ran in 3 days just got pushed out to a week. It's just a tradeoff.

    What the poster really needs is to rewrite the program to drop intermediate data along the way. If you have hourly checkpoints you can minimize the amount of data lost. How to implement checkpoints is left as an exercise to the reader :)

  128. Checkpointing? by rnturn · · Score: 2

    If memory serves me (hey, it is Friday after all and both brain cells are pretty tired) we looked into something like what the poster was asking about years ago. In those days, we were running some simulations on a PDP-11/70 that took 7-10 days to complete. In the event of a general power failure we wouldn't have been able to run on backup power for very long. DEC's RSX had a feature whereby a task could be checkpointed to disk. Then, presumably, it could be reloaded and resumed at the same state it was in at the time of the checkpoint. We never did implement it since it would have introduced too much delay into the project schedule (adding it to the simulation, testing, etc.) but it sounds like the sort of thing that could be useful in current day OSs. Anyone know of any general purpose operating systems today that have this feature? I haven't heard of any and wonder (not too seriously, mind you) if anyone sells core memory for a PC architecture computer. Of course, it wouldn't be very fast but you'd worry a lot less about power failures that are longer than the UPS's ability to provide power.

    --
    CUR ALLOC 20195.....5804M
  129. Think of VMware as a process wrapper by Binx+Bolling · · Score: 2, Insightful

    This is why VMware suspend works the way it does. It provides a consistent virtualized hardware interface, regardless of the details of the real hardware. The original question referred to individual process saving, and VMware suspend is similar to the whole OS suspend feature in laptops. Nevertheless, if you consider VMware to be a wrapper for individual processes that you want to be able to checkpoint, it turns out to be quite a nice solution to the original problem with zero programming required, and just a little pocket money to implement.

    bb

  130. Suspending processes by fitten · · Score: 1

    Back in a previous incarnation, one of the projects that was going on at our place was called HECTOR. I didn't work on it but some of my friends did. It worked on a variety of UN*X flavors. Something similar to what you are talking about was that any processes that ran through it could be suspended, including file handles and sockets, and then be started on another machine (sockets only as long as all the processes connected via sockets were also running over it). It was used primarily with MPICH (but would work independently of MPICH). It was used to load balance a network of workstations and migrated processes around the cluster if node(s) became loaded beyond some threshold. To find links, search on http://www.google.com using "HECTOR RUSS" or "HECTOR ERC" (Dr. Sam Russ was the lead).

  131. Hibernation by Anonymous Coward · · Score: 0

    Funny, my Win2K laptop can hibernate in the middle of a Winamp track and when I wake it up, it picks up in Winamp exactly where it left off -- before I've even logged in!

    Of course, it can only pick up the 802.11 connections half of the time and only know how to use them half of that so I still have to reboot before I can see the world... but that's not the point!

  132. dump core, then pick it up in gdb and 'c' by PaulBu · · Score: 2

    you can always dump core of the process
    (e.g., kill -SIGSEGV), then load the core file
    it into gdb (gdb program corefile) and
    issue 'cont'.

    The OS state would be gone though (so, no
    files besides stdin/stdout), but for purely
    computational process that might work as a
    one-time shot. At least you could save main
    arrays from gdb and read them in into a modified
    program.

  133. look into process migration by moocat2 · · Score: 1

    There has been quite a bit of work into doing process migration. The idea is transmit the entire state of a process to another computer to continue execution there. If you instead of transmitting the state to another machine, you wrote it to a file, this would do exactly what you are interested in.

  134. This isn't going to be received well... by neoevans · · Score: 1

    Doesn't Windows XP's Hibernate feature do exactly this?

    --
    "You are not a beautiful and unique snowflake."...Tyler Durden
    1. Re:This isn't going to be received well... by osu-neko · · Score: 1

      No, it doesn't do anything even remotely similar to this. You know, if you don't even know what the word "process" means when used in a computer science context, perhaps you should consider not posting to a thread about freezing/thawing a process... Not that this has stopped the majority of people posting here about laptop suspend/resume, VMware, and other completely unrelated topics. This is a simple case of process migration. Of course, process migration isn't all that simple...

      --
      "Convictions are more dangerous enemies of truth than lies."
  135. Virtual Machines by rjinbanff · · Score: 1

    This might be completely wrong, but couldn't you use something like vmware and 'suspend virtual machine'.

    I'm pretty sure that when you started up your virtual machine, your program would still be running.

  136. Solaris does this. by Anonymous Coward · · Score: 0

    I do this all the time on my Solaris Box, I press the power key on the keyboard (sun type 6 keyboard), and the entire system state is paged to disk. Power shuts off..

    When I power back on using the power button, it goes through openfirmware, boots the kernel, then restores the system state... paging typically takes about 90 seconds on my Ultra 30 with 1 gig of ram. There are two paging requests so roughly three minutes spent in doing suspend.

  137. Awww by Anonymous Coward · · Score: 0

    If you had checkpointed your calculations, you wouldn't have to redo them from the git-go, now would you?

  138. what about undump? by SparkGapTransmitter · · Score: 1

    There has been a utility available on unix for ages called undump that sounds like what you are looking for. It seems like old versions of emacs used to use this to decrease startup time buy creating a new executable at the point that all of the initialization was completed. A quick search indicates a copy here.

  139. Checkpointing for NT and some history by Anonymous Coward · · Score: 1, Informative

    In the 1980s and maybe much earlier, the Cray supercomputers NLTSS operating system had this feature to allow stopping/restarting of applications and it was called checkpointing.

    For Windows NT, Lucent had a group that developed Fault-Tolerance software which had a checkpointing feature. This was called SwiFT.

    http://www.bell-labs.com/project/swift/

    At the same place but under the support, there is some mention of a Unix version.

    WhatMeWorry

  140. I'm suprised nobody has search fm yet... by cduffy · · Score: 2

    ...and found esky, a purely userspace checkpoint/resume implementation.

  141. Hmmm...done anything with corefiles yet? by zorander · · Score: 1

    I'm wondering since you can kill a program with sigabrt or sigsegv and get a core dump, would the core dump be enough to restart it again? I know gdb can do this for debugging purposes (although running real code inside gdb to accomplish this end would be quite the inefficient solution). I'm going to play around with options a little bit and see if i can cook something up...

    Brian

  142. Re:Great for high end biz systems during off hours by ADRA · · Score: 1

    After tooling with the kernel, I have been told time and time again, if it needs to be in the kernel, then put it in. If not, make it user space.

    This process hibernation deal does not need to be in the kernel simply because a program should have the option of sleeping, or what have you built into the programs construct.

    To put an operating system klude to support a program's shortcomings is a microsoft mentality that I would rather not have repeated in Linux.

    If at the very most, the kernel should give the process the ability to capture all of its relevent data before closing.

    Reasons why to not implement "random access" process hybernation:

    1. File Access:
    The operating system would have to guarantee that the file descriptor is stored, and that the referencing file is not unlinked. The alternative is having the program smart enough to realize it was put inot hibernation, hence throwing away the advantage of a kernel solution.

    2. Mutual Exclusion:
    If you have a program that uses OS based mutual exclusion, you will run into several problems.
    (a) If the OS does not yank the lock, other programs that share the lock will be stuck forever, which could break like tons of programs. Most shared libraries should use locking to keep the two processes from trouncing on one another during execution in the module, so...
    (b) If the OS releases the lock, then when unhibernated, the program could run into serious problems if it thinks it has a lock, but in reality, does not, or you can run into cyclic lock dependencies and race conditions if the locking code was not writen right. This issue has the same issues involved with preemption.

    3. Networking:
    Pretty obvious, but if the program and the server do not know about the hibernation, the server should grumble but live on (never trust those clients..), and the client will probably become SOL and defunct. Since there is no notification that the connection was broken, the OS can either send an invalid descriptor (if it isn't stored), or it can be a little smarter and say that the foreign host closed the connection.. This one can be solved, but I think that saving the descriptor and reviving it could be interesting..

    I am sure there things I have not covered, (Removable media syncs..), but this is too long already. There are a lot of technilogical factors which would make this very hard for a single kernel fix, but if we tied a unified solution into the user space, we could make a slow transition to supporting this.

    --
    Bye!
  143. Related by edmund_troche · · Score: 1

    This is yet another suspend alternative. This one is not your thread checkpointing type of solution , but allows for a software suspend with no APM support. I scanned though the messages and did not find a reference to this link so here it is, http://falcon.sch.bme.hu/~seasons/linux/swsusp.htm l

    I hope it helps.

  144. Should be pretty easy by SkewlD00d · · Score: 1

    Except for hardware and driver state synchronization; all you need to do is be able to pause a process, take the code from fork() and copy the process to disk instead of creating a new process, then kill -9 the process. Of course, you will have to iterate in a preorder tree- traversal to get the leaf nodes first. I think I could bang this out in Minix in about 20 min, Linux kernel mods would probably take me a day. But, you'd have to modifiy all of the drivers to support "suspend-to-disk" type operation, such as ACPI winbloze type of stuff, because the hardware will have to be reset in a state to match the software.

    --
    The biggest trick the devil pulled was letting lawyers become politicians so they can write the laws.
  145. Solaris has done this for a while. by cgleba · · Score: 2

    I remember an option in Solaris 7 that lets you dump memory to swap, shut down the computer and when you restart it reads swap and drops you back into the exact same state as you were in before.

    Pretty cool because you could restore to a full X-session with all the programs and documents you were working on before undisturbed.

    I don't know if this is what you were looking for. . .

  146. A little over done? by maop · · Score: 1

    Just change the program to be able to save and load a partial solution onto disk. Problem solved.

  147. SGI Irix has Checkpoint and Restart by bsauls · · Score: 1

    From the cpr manpage:

    IRIX Checkpoint and Restart (CPR) offers a set of user-transparent software management tools, allowing system administrators, operators, andusers with suitable privileges to suspend a job or a set of jobs in mid-execution, and restart them later on. The jobs may be running on a single machine or on an array of networking connected machines. CPR may be used to enhance system availability, provide load and resource controlor balancing, and to facilitate simulation or modeling.

    There's even an option to restart the process(es) after upgrading the OS.

    Some caveats, the following system objects are not checkpoint-safe:

    o network socket connections; see socket(2)

    o X terminals and X11 client sessions

    o special devices such as tape drivers and CDROM

    o files opened with setuid credential that cannot be reestablished

    o System V semaphores and messages; see semop(2) and msgop(2)

    o memory mapped files using the /dev/mmem file; see mmap(2)

    o open directories

    Of course, you need proprietary SGI hardware.

  148. Re:Suspending your application code might be simpl by Suppafly · · Score: 2


    I think laptops work by getting the applications and OS into a safe and simpler state and then saving that state. I suspect they cannot save any arbitrary application you could write - just the applications they routinely run.


    If you've ever used a laptop with this feature you'd realize what you just said is totally wrong.. the hibernate function of these laptops is managed by hardware not software and so is os and program agnostic. When you close the lid or hit the sleep button, it dumps the entire state of the ram in to a special partition and turns off.. when you revive it, you are back exactly where you left off, regardless if you are running windows or linux or if you are playing quake3 or cracking rc5 stuff.

    By managing this stuff in hardware, its actually less complicated and works 100% of the time as opposed to the windows software solution that often refuses to 'wake' after being put in sleep mode and is dependant on the power supply being on and supporting the feature.

    If someone were to add a feature like this to a large multiuser mainframe type system, it would definately make more sense to go with a hardware based solution that dumped the system state to a disk or multiple disks to ensure that it always worked and not just some of the time for some of the apps.

  149. MOSIX by JHillyerd · · Score: 1
    The MOSIX project does process migration on Linux. It works pretty well. I'm not sure what happens when the machine the process was launched from dies though, probably nothing good. =)

    MOSIX

    1. Re:MOSIX by Wumpus · · Score: 1

      I seem to remember that you can shut down the originating machine, and the shutdown scripts will force everything that can be migrated (i.e. isn't tied down to the hardware, or (unfortunately) a TCP/IP socket) to go find somewhere else to play.

      To solve this guy's problem, he would have had to run Mosix over a VPN with a redundant machine off-site. Not easy to set up or maintain.

      The VPN is necessary because Mosix sends internal kernel information over the network.

  150. Off topic? by Anonymous Coward · · Score: 0

    Off topic? You've got to be kidding. Flamebait is arguable, but off topic?

  151. Use an SGI by Anonymous Coward · · Score: 0
    Irix allows you to stop a process and move it to another processor - which is often another box linked to it via the CrayLink cable - making multiple boxes into ONE box (not a cluster, ONE BOX). You can then start it there. caveats to processes that talk with other processes.

    While the proprietary guys have been slow to get many of the Open Source Tools - Sun is just introducing gzip, apache in Solaris 2.8; ssh in Solaris 2.9 - they often ARE focussing on things that are dreams in the Open Source only world.

    Convergence should benefit both sides.

  152. Perspective on solution by rcj4747 · · Score: 2, Insightful

    At first this seems like a nice idea. It would be elegant to be able to halt processes and resume them later without them consuming resources in the interim.

    Before going forward ask yourself what the practical application of this work could be. If you have to reboot systems with long running computational work going on you may need more reliable hardware or better management of the system to increase uptime. Furthermore, adding "suspend/resume" functionality to a single process within it's own code would probably be far better as needed.

    Secondly, think of the concerns you face in implementing this as a generalized solution for user processes. Here are the problems with this concept that I can see.

    First, file handles, file system pointers, network connections may not exist when the process is restarted. Let's say that there is processing of NFS data being done and when the process is resumed that mount is no longer accessable. You get an error from NFS like ERRIO or the like and the process dies.

    Secondly, the hardware may no longer be available. What if the process what using a PCMCIA card which has been removed. The process dies. In a more simple case, a process could have a tty open for I/O and that tty may no longer be owned by the user when the process is restarted.

    This requires saving a lot of system state and does little to guarantee that the process can be restarted successfully and safely. Furthermore, the dependancies for a single process (being fairly complex) would require a good knowledge of the process by the user to determine the feasability of suspending and resuming the process.

    It seems that this would not accessible by average users of the system if it were possible to create in a generic sense.

    It does stand as a good question to start someone thinking about unix internals though.

  153. survival... by Anonymous Coward · · Score: 0

    ...of the fitness!

    I love it. I also like living life to the fullness.

  154. You can do OS-independent process-hybernation... by eyefish · · Score: 2, Interesting

    Something many people not familiar with J2EE (Java 2 Enterprise Edition) know is that when you have an application running in a Java container, it, and the state of all its processes get automatically saved and restored whenever the container, the OS, or the machine crashes. True, in practice some diligence is required from the programmer (for example, when you need to set obejcts to specific state upon re-instantiation), but the functionality is there, is OS-independent, and it's been proven and used daily in heavy-duty environments for a few years now.

  155. Harrumph by Anonymous Coward · · Score: 0

    Free as in beer? If it's not free as in energy it's probably not worth lookin at.

    ;-)

  156. Re:Use Windows XP - OT by ADRA · · Score: 2, Informative

    Yeah, not really relevent to the main topic, but any modern PC's do have suspend support built into them, so the no-additional software thing is a pretty moot point.

    Hibernation IS a software thing, and it just means that when the OS receives or generates a shudown-hibernate event, that the OS writes all available memory and state to disk and shutdown, setting a flag that the OS can know that it was hibernated to begin with.

    --
    Bye!
  157. Timesharing by Anonymous Coward · · Score: 0

    This would be great in environments such as universities that still sell time on Crays or similar systems. Lets say you have a simulation that will take 3 days of non stop computing to complete. Well if you were able to lease 2 hours a day for one month, you might get it finished. Or if you were able to get time on multiple machines you could transfer a frozen image over to that system. It doesn't sound that hard to implement just so long as it is well tested to prevent corruption.

  158. HP3000 also. by Anonymous Coward · · Score: 0

    We used to have a feature like this in an old HP3000 (model 70?) minicomputer back in the early 1990's. We ran a text-mode accounting system and all users had HP2932 dumb ascii terminals connected via RS232 serial lines. Whenever the machine would encounter a power failure (we had no ups), after the system was restarted, all the users' sessions and the programs they were running would come back to life at the point where they left off. You just had to hit the enter key and all the applications' screens would repaint themselves. You might have lost a few fields worth of data on the immediate screen you were entering data upon, but that was all you lost when the system went down. This type of powerfail recovery was really a nice feature. Sometimes when the O/S (MPE) would crash, we could also recover users' sessions, but usually a hard O/S crash meant that the recoverability info also got corrupted and everybody lost their sessions completely.

  159. Mouth-to-mouth with an Octane by Anonymous Coward · · Score: 0

    With IRIX you can perform CPR and the process will miraculously revive!

    http://techpubs.sgi.com/library/dynaweb_bin/ebt- bi n/0650/nph-infosrch.cgi/infosrchtpl/SGI_Admin/CPR_ OG/%40InfoSearch__BookTextView/110

    "Are you an Anonymous Coward or just too lazy to register?"
    -- Anonymous Coward

  160. If you really want this... by J.C.B. · · Score: 2

    ...why not just boot up classic at startup? My brother set his computer to do this, you can too if you don't want to wait.

  161. This is not a new concept... by Anonymous Coward · · Score: 0

    SGI's IRIX has had this for a while. Ironically engough it's called CPR.

    IRIX Checkpoint and Restart (CPR) is a facility for saving the state of running processes, and for later resuming execution where the checkpoint occurred.

    See the IRIX Checkpoint and Restart Operation Guide for more details. see

  162. Why over-engineer it? by Anonymous Coward · · Score: 0

    OK, Slashdot let's cut down on the 'feature creep'. Download managers and web-page mirroring software has been doing this for years.
    1. Write current state to log at say 15 min. intervals.
    2. Continue whenever you want from current log state.
    Easy.

  163. Blame the coder? by JoshMKiV · · Score: 1

    Ok, perhaps it is not possible in this instance, but for the vast majority of systems, you can store data as you are going through. If you KNOW the process is going to take ages, it should be implemented in code.

    Bad coder, no cookie.

  164. Can you have both? by NetJunkie · · Score: 1

    If you wanted to save developers work, you'd have to do this at the hardware level like a notebook does. That way the entire system, OS and all, get dumped to disk.

    If you wanted only specific apps to do it, then the developer of the app would have to handle that...I would think....

    1. Re:Can you have both? by mlheur · · Score: 2, Interesting

      what if the OS had a hook in it to like
      `kill -FREEZE &LTpid&GT`
      No new hardware, only done once, will work on all processes.

      And as described previously, the FREEZE signal would cause the process to dump execution code, memory pages, FD's etc. etc. to a dump file.

      reboot the system.

      Then find some way to execute that dump file which will in turn load FD's, pages, execution code, and resume with the IP (instruction pointer, not IP Addr. for those not arch inclined) in the same spot?

      /me isnt much of a kernel hacker so I dont know the details of how to do, but that's my high level solution.

    2. Re:Can you have both? by TheCarp · · Score: 1

      Actually...this is probably better done in userspace.

      Send a signal to the proc causing it to dump core (means the proc can't trap at least one of those signals)

      Then have another utility to turn the core dump into a new executable.

      I have been told (by a much more expoerienced hacker than myself that such a utility is part of the emacs compile... it makes a small lisp interpreter, then starts sucking in lisp code... finishes... dumps core, then the core dump is turned into the emacs executable

      Quite doable....if evil magic.

      And a silly solution to this problem, since using it requires that you shut the program down in this very graceful manner. Its also not portable to new platforms, since it requires knowledge of internal OS and architecture stuff.

      Much better to use log files (or "checkpoint" files to use a more pointed term) as those are a portable solution that still does the job if someone trips over the poer cord, or the machine otherwise dies before you can gracefully kill the process.

      -Steve

      --
      "I opened my eyes, and everything went dark again"
  165. While we're at it by electroniceric · · Score: 1
    Would it be possible for another machine to pick up a process that has been frozen to disk and run it?

    That would sweet - when the load on my home machine gets to high, I just freeze some process, send it over to another machine and finish running it...and eventually have the OS work up to just picking up cycles automatically.

  166. Look at KeykOs by DV · · Score: 2, Interesting

    Basically that was one of the ideas behind the research on micro-kernels. If the state of the system gets small and centralized enough one could not only make a single process persistant but the full system persistant.

    KeykOs was a very promising system offering this at the time. One could not checkpoint the connections outside of the machine, but their demo was a BSD machine with X11, whose powerplug was violently removed. When replugged the state of all processes saved at the last checkpoint was resumed and the system would continue ... Including X-Windows !!!!

    Now wait for the Patent to expire, put it in Linux and watch the world of computing change.

    It was very promising at the time I was doing my PhD 10 years ago, I don't know why this never "made it"

    Daniel

  167. Re:Former Enron Exec, Found dead at 42. by Anonymous Coward · · Score: 0


    BAXterrr...

  168. Offtopic? by Anonymous Coward · · Score: 0

    come on, thats a complete load of bull

  169. Process migration by Khelder · · Score: 1
    I know it's not exactly what you're after, but transparent process migration was part of the Sprite operating system (developed between '84 and '92). Anyone planning to implement single-process hibernation/suspension/whatever-you-want-to-call-i t might want to check out their papers.

    Other cool features of Sprite included a log-structured file system (yeah, everybody has one now, but they didn't 10+ years ago) and RAID.

  170. BTW, simple UPS in PSU? by KjetilK · · Score: 1
    This is slightly OT..., but:

    Why doesn't standard power supplies come with a small builtin UPS?

    I lived at a place for a short time where the fuses blew now and then. It only took me a few seconds to fix it, but it caused loss of soem data and an unclean crash.

    I only need, say, one-and-a-half minute. When the UPS looses power from the net, it waits a few seconds to see if it was transient, then alerts the user (the user probably knows, as the lights may have gone out). If power isn't restored within one minute, the system will begin to take down and save large processes. When, finally, the "I'm dying" signal comes, the system will do a clean shutdown.

    While I know of a few internal UPSes, some of which seems neat, I only know about one such unit, Amsdell IPPS, but that company seems rather dead. I know they still exist, but I'm not sure they will ever power an Athlon. Also, it comes with short cables, I need some long ones for my cabinet, and you need to take a wire out of the cabinet. I looked at my Mobo, and I have an SMBus, perhaps that could be used for this purpose?

    I mean, this should be widespread...

    I guess the answer to my question is that most people are so used to crashes, a crash because of power loss isn't such an issue... :-)

    --
    Employee of Inrupt, Project Release Manager and Community Manager for Solid
    1. Re:BTW, simple UPS in PSU? by man_ls · · Score: 2

      I'm thinking an internal UPS that operates sort-of like the "Shead" switch in small aircraft. When you have to use it to conserve power, it kills all power to all instruments on the copilot side, some lights, and other stuff, leaving only the radios and nav instruments operating - enough to fly but not enough to do much else.

      The question is - how would Windows react to suddenly having it's, say...CD-ROM and floppy drives just cease to exist while the OS were running? I've accidently pulled power cables on drives that weren't in use, had no active handles on them, and hadn't even had media inserted into them that session, and they still caused massive problems in Win2K.

      I don't think most OSes would react well to having the power to everything except the processor and hdd0 shead from under it, even to conserve power while a savestate took place.

  171. This is why EROS was invented... by jcr · · Score: 2

    Check out http://www.eros-os.org.

    EROS processes persist until you take them down. They persist across power loss, system upgrades, etc, etc.

    -jcr

    --
    The only title of honor that a tyrant can grant is "Enemy of the State."
  172. That's what SIGs are for. by Anonymous Coward · · Score: 1, Informative

    When the UPS daemon senses that it's time to shut down, it sends all processes a SIG to warn them. This gives each process a chance to clean up, save state, and exit. Your program just needs to respond in the appropriate way to the SIG your UPS daemon already posts, so it can resume where it left off next time it's started. Doing this on an OS-wide basis, I think, would be overkill.

  173. Sun Already Does This by Anonymous Coward · · Score: 4, Interesting

    Sun already implements a system suspend/unsuspend in Solaris that works on all boxes but the Blade 100's.

    10 years ago I worked on a Unisys Unix box that did it automatically, meaning you could pull the power out of the wall without any warning and then plug it back in later. When the system rebooted, it would say "there's been a power failure, recovering" and then put all the processes back to the way their before. Even with an open vi session where I was actively typing, I wouldn't lose more than a character or two.

    I found out the machine had it quite by accident because my loser boss turned the box off one evening without doing a proper shutdown... Once I saw what it did, this required further testing :-)

    Still, what would be even better is if it could be done on a per process basis. I can think of many reason why you might want to suspend a process for a few days and bring it back later (say something you only wanted to run outside of work hours), but had no intention of shutting the whole box down. And this should be implemented in the kernel, not hacking each program to provide this functionality.

  174. Yeah, original idea by Anonymous Coward · · Score: 0

    My sun Ultra 1 (solaris 2.5) has been doing this since the day it was new.

  175. Cray UNICOS by Huusker · · Score: 2

    What if the process has forked off a bunch of children? Are you going to archive all the children at the same time? What if the process has a whole bunch of files in /tmp, are you going to roll them up into the freeze state as well? What if your using pthreads? Are you going to keep the state for each thread? How about file pointers?

    Back in the 80s, Cray UNICOS had a cadillac checkpoint package. It could track child procs, save /tmp files, save threads, save pipe data, and pass down SIGCKPT for user-controlled checkpoint.

    Of course at $1000/hour you want to damn sure be able to save your work :-)

  176. Linux Innovation by Anonymous Coward · · Score: 0

    Windows allready has this "feature" (for better or for worse). Of course, if/when it is put into linux, it will be innovation?

    1. Re:Linux Innovation by SLi · · Score: 1

      No, it doesn't. Try read things again and go figure out why this is different from suspending a laptop.

  177. VMware already does something like this by linux386 · · Score: 1

    The suspend feature in VMware can just suspend the entire system. The performance hit is usually not too bad for the added features like undoable disks and suspending. This is really helpful when you have a buggy laptop that suspend freezes half the time.

  178. Palm sort of does that by josepha48 · · Score: 2
    Palm OS sort of does that.

    On a palm you can shut if off and when you turn it on it is where you left the device at. I think it would be neat too if this could be the way operating systems worked. Ideally one would be able to turn off the computer in the middle of an app and it would turn on at the same place it was left at.

    Of course the palm does not do multitasking, multiprocessing or anything like that and when you close an app it is usually sent back to its initial state.

    Maybe the way to do what this user wants is to take journaling to a next step, and rather than have a journeleing file system have a database file system where stuff is done in commits like a jfs. Then one could do rollbacks as well. This would require the whole system to be rethough out though.

    --

    Only 'flamers' flame!

    1. Re:Palm sort of does that by aderusha · · Score: 1

      well, palm kinda does this, as long as you have batteries to keep refreshing the dram. run out of batteries, and you lose everything.

      i think he's talking something more like the win2k "hibernate" feature, something which is also done in hardware on a number of laptops (the thinkpad series for 1)

  179. process migration by asubra · · Score: 1

    Couldn't agree more with the article and shawarma. It should be made possible on unix and unix clones. Better still would be, if the running process could be migrated to a different machine with total preservation of its current context, to run uninterrupted.

  180. Cryogenics for functions by BuffJoe · · Score: 0

    Although not quite related, in Stackless Python, there is work being done on allowing function invocations to persist in secondary storage and get called again in a later program invocations. They call it "pickling" a function call. In theory, one could pickle the main() function call at some point in its execution and achieve the same effect as suspending the process and recording the state of the program.

  181. Adams Constant by dynamic_cast · · Score: 1

    If your program dies for any reason, just before computing the answer, the answer was most likely to be 42.

    This is the Adams Constant.

    ;)

  182. It already has it.. by Anonymous Coward · · Score: 0

    When you first start Classic it boots up MacOS 9. However, after a specified period of time of not being used it goes to sleep and consumes 0% CPU and pages all the memory - it's just like turning it off. Just run an OS9 app and it comes back to life.

    You don't want it to always start from a saved state because OS9 just isn't reliable enough.

    Because you never have to turn off OSX (sleep mode works great) you should never have to launch Classic more then once.

  183. sandbox, virtual machine by Deadplant · · Score: 1

    Perhaps some sort of wrapper could be developed that you could run most simple apps in that could be suspended...

    Certainly one would think that apps written in langauages like java that are already set to run in a sandbox would be fairly easy to wrap and suspend... as apposed to things like X windows or things that work too closely with the hardware.

  184. Buy a generator by Anonymous Coward · · Score: 0

    The simple way is to simply buy a small generator.

  185. Somebody has to say it... by kneeo · · Score: 1

    Let sleeping processes lie

  186. Its called "Hibernation" by Anonymous Coward · · Score: 0

    MS 2k/xp os's have done it, why can't it be done of UN*X?

  187. Re:Search in the slashdot archives for kernel patc by Anonymous Coward · · Score: 0

    Ahem this should b -1 redundant... thsi wascovered 900 posts ago and is marked a +5 already.

  188. I want my debugger to do this!!! by RockyJSquirel · · Score: 1

    It would help so much in debugging if you could save the current state of a process so you could try "debugging from here" over and over.

    Rocky J. Squirrel

  189. Re:HP does it with Omnibook by Petrus · · Score: 1

    My HP Omnibook is saving to disk when it runs out of battery. When I srotch the laptop back again, it is first restoring the RAM for few seconds and then it freezes ;-(

    It only restores if te batteries run out in Windows.
    So, obviously it is not very good implementation.

  190. I think we are all missing a point of complexity.. by NoahPlark · · Score: 1
    First off, lets exclude the "program the suspension into your software" ideas. Not that they are not valid (they are extremely valid), but lets say that you either do not have access to the code or the integration of suspend/recover in the code is prohibitive because of complexity, time to test, resources, etc. Also, lets exclude the "whole system shutdown" approach, which is covered by other software.

    What you are really looking for here is two fold: You want to be able to suspend/recover more that one program associated with a particular task (lets say a program with its associated database tasks) AND you need to coordinate the suspension/recovery.

    So what you need is a coordinated core system that cores out the processes you tell it to (along with child processes) at a single specific time. The coordinated core system would then be able to restart the processes at a later date.

    Wow! That would be cool. How many times have I wanted to freeze a set of processes, reboot a box, and then start them up where they left off. It would be a great debugging tool!

    OK, feeling a little giddy.... deep breaths....

  191. Checkpoint/Restart for Linux by Eric+Roman · · Score: 1

    For those of you interested, I'm part of a group developing checkpoint/restart for Linux. We're fairly early off in the project, but we're going to be adding this feature to Linux fairly soon. (Hoping to have a patch/module release out in May.)

    We're putting two features in: Checkpoint/Restart and Suspend/Resume. Checkpoint/Restart allows you to save a running session or process to disk, and restart it sometime later, on a different node, or after a system reboot. Suspend/Resume does more or less the same thing, but keeps the process data structures in the kernel, without writing them to disk. S/Resume won't work through a reboot, but it's useful for certain applications. You can think of it as a combination of swapping the process to disk and hitting ^Z to nab the process.

    We're putting in some signalling mechanisms, to allow the process to catch the checkpoint, restart and continue signals. We're also going in and adding some code to capture data in pipes and FIFOs. It'll work with multi-threaded processes, and full UNIX sessions (so you could checkpoint, say, a login shell and e-mail it to all of your friends. :)

    Our checkpoint/restart is meant for scientific applications, but should work on just about anything else. We're going to spend this summer hanging out with the LAM crew to make it work with MPI applications correctly.

    For those of you looking for something to download, I'm sorry I can't post a working link right now, or any code. We just got past our requirements document, and we're putting the design spec's together now. The req's doc't is due to be published next month, an implementation survey is coming out in March. If you're interested in having a look at those, drop me a line, and I'll let you know when they're available.

    - ERoman at (no spam) lbl dot gov

  192. checkpointing Unix processes by Anonymous Coward · · Score: 0

    A good place to start is the technical report from the UW's Condor project here

  193. Gasp! by Anonymous Coward · · Score: 0

    Hasn't Windows 2000 had this ability since it was released (hibernate)? GASP! Could it be that Linux is missing a feature that Windows has?! Ack! Quick, someone hack the kernel before we are all assimilated by M$!!

    *crude laughter heard in the background*

  194. Should be easy... by Doctor+Memory · · Score: 1

    just swap the process out, then write the pages from the swapfile to a regular file. Like the shop manuals say, "assembly is the reverse of disassembly"...

    --
    Just junk food for thought...
  195. UML instead of VM? by fragment · · Score: 1

    A lot of people have said that some sort of VM would be ideal for this (VMWare, JVM, etc.). What about User-Mode Linux? Would it be feasable to either add checkpointing to the UML patches, or to load/unload UML in a frozen state?

  196. Re:OS X needs this especially - or not. by Judge_Fire · · Score: 1

    The more cumbersome Classic OS 9 feels, the more it drives home Apple's point of getting developers to OS X.

    If users feel the pain, too, they'll bitch about what a pain Classic is and how everything is cool in OS X.

    I bet Apple wants to avoid a pro- longed two- system situation, for example by NOT making it nice and comfy.

    Judge_Fire

  197. Lets extend it further...... by Anonymous Coward · · Score: 0

    Couldnt we extend this idea further to allow say a running system a kernel update without rebooting? Preserving those precious uptimes everyone drools over? Just a random thought I had while drawing boring network diagrams and thinking of more fun things......

    ------------
    Human Stupdity + Computers = IT

  198. SVR4 Powerfail Recovery by relliker · · Score: 1

    AT&Ts SVR4 Powerfail Recovery mode does it well. You can set Powerfail strategy to either shutdown or recovery using the 'strategy' command or setting the STRATEGY variable manually in /etc/default/dump. In recovery mode, memory is saved to the dump slice and when the power comes back on, it is restored and continues where it left off. Simple as that. Network connections DO suffer obviously but even an active Informix engine continued running after such an outage. Alas, NCR's roadmap is going to kill SVR4 3.02.01 in a short while :(

  199. Checkpointing Support Issues by Anonymous Coward · · Score: 0
    What you want to do is have persistence of software structures so that you can restart at the time they were last recorded in a valid state and replay the updates since that point rather than restarting from scratch. The ways of handling this are:

    1. Build it into the application. Numerical methods developers have been known to do this.
      Many people in the parallel discrete event simulation community do this, sometimes using
      compiler assisted tools.
    2. Build it into a middleware layer --- this is the approach typically taken by many database management systems, in parallel discrete event simulation toolkits , and in distributed process migration toolkits such as Condor and BSP
    3. Build it into the kernel --- this was done by keykos and the eros group. If the kernel has support for persistent objects and the application uses this support, process restart is well defined. I've seen limited forms of suspend and wakeup for mobile devices as well, but this requires an explicit suspend/restart rather than fast crash recovery like eros has.
  200. cryogenics by Anonymous Coward · · Score: 0

    I had a similar thing happen last year The power went down while I was running something on oue of my computers. Not being a techi I chose the simple option. I had only two of my computers running. all of them have UPSs of one brand or the other. I unplugged the important backup and plugged it into one next to it, I daisy chained all five backups, and it worked. when the power camu back on in two hours the daisy chained backups were still running.

  201. Kind of like . . . by The+FooMiester · · Score: 1

    Kind of like how the earth was destroyed by the Vogons 15 minutes before the ultimate question would have been produced? If only the mice had compiled the world with the SavePlanetState() function . . .

    --
    The previous has been a secret message to my comrades.
  202. My experience with windows xp hibernation by Afrosheen · · Score: 2

    Step 1: Clean, fresh install of XP Pro corporate.
    Step 2: The requisite reboots until everything works.
    Step 3: Leave the office, set computer to hibernate for fun.

    I. Results
    A. Blue screen of death upon return to office.
    B. Reboot yielded '/windows/config file is missing or corrupt'.
    C. Much cursing and a swearing off of anything Microsoft.

    XP isn't as wonderful as people would have you believe. A short trip to google inquiring about repairing this mess will result in endless posts.

  203. User mode Linux + checkpoint == Very useful by Anonymous Coward · · Score: 0

    I can't believe no one has mentioned this yet. User mode Linux is a linux kernel that runs as a process under Linux. You can run many copies of Linux on a single machine. With the checkpoint (or whatever you want to call it) system we're talking about here, you can save the entire state of a system. Very nice.z

  204. Ejasent did/does this by AlexOsadzinski · · Score: 1

    A couple of years ago, I interviewed with a company called Ejasent (formerly Apera) that had a modified Solaris kernel which allowed them to freeze processes, and then thaw them very quickly (low milliseconds). Their goal was to build an edge server network allowing very quick bring up of common apps (like "start Oracle and look for this book") for browser-based clients. I met with them again a couple of months ago, and they seem to have made a lot of progress, and the technology is now sanctioned by Sun. http://www.ejasent.com

  205. another common use for this... by Anonymous Coward · · Score: 0

    ...is a list/scheme interpreter making a dump of it's state which can be later loaded or executed. i think the scm interpreter uses the code from emacs (which makes sense as it basically is an os in lisp (vi forever! =)), but the idea goes back quite a long way.

  206. The Holy Grail: socket migration problem by Anonymous Coward · · Score: 0

    Has any project solved the Holy Grail of distributed computing: migrating a process WITH ACTIVE SOCKET CONNECTIONS to another machine? Clearly this problem would need the assistance of an external router, but it could be done.

    1. Re:The Holy Grail: socket migration problem by Anonymous Coward · · Score: 0

      The solution is: Send a signal before you put anyone to sleep, and another when you wake them up. Or just send connection timeouts when they restart. I suppose if all of the processes were on the same system or cluster you could do something more useful, but most programs that deal with sockets need to be aware of arbitrarily long breaks in the communications.

  207. Dumb mice by WyldOne · · Score: 2

    And if those mice were so smart how come they didn't think about it? Even I know that hardware fails.

    --

    make Linux, not Microsoft. sin(beast) = -0.809016994374947424102293417182819
  208. Think ahead by Hobbes_2100 · · Score: 1

    If you're doing a long term processing job, it makes sense to store results incrementally. No big black magic here.

  209. VMware does this, easily and effectively. by PatJensen · · Score: 2
    I just deployed a FreeBSD 4.4 virtual machine onto an IBM NetVista using VMware Workstation 3.0, which can safely put any PC-based OS into a hibernation mode on demand with one click.

    This hibernation mode snapshot can be duplicated or even put on other machines in the event of a system failure. The virtual machine will then come back online like nothing ever happened, with hardware devices effectively still attached and processes still running.

    It works really slick, you can perform other tasks and come back to your virtual machine later without slow boot times. This will also work on Linux, Solaris, and Windows platforms. I'd highly recommend VMware for on-demand OS access.

    -Pat

    1. Re:VMware does this, easily and effectively. by Intrinsic · · Score: 1

      The onld problem I see with this is since its a virtural machine its going to run alot slower than a physicial machine devoting most of its cpu processes to that one thread.

  210. No need ... the answer is by tmarzolf · · Score: 1

    42!

    --

    This Sig has been depreciated.

  211. cryogenics by dildofire · · Score: 1

    ok, so we know it's easy to have a process write all the contents of it's memory space to disk (dereference an invalid pointer in C, and the os will do it for you), so suspending isn't a problem. the problem, which is pointed out in several places in this thread, is starting the process back up again, because it won't start in the same memory space, pointers will be invalid, etc. so, would it be possible to have a program that would resurrect a suspended program, and spoof it to make it think it's in the same memory space it was before it was suspended. kind of an emergency vm of sorts. does this already exist? if not, why?

  212. ...getting annoying by Anonymous Coward · · Score: 0

    is it just me or does it seem like slashdot is being used like a search engine.

    "Hrm, I want to know about blah, but I'm too lazy to search on google, so I'm going to ask slashdot cuz it gives me a hard-on to get a post on the front page."

    The flow of redundant and pointless questions has ruined slashdot.

    Half of the questions on slashdot could be answered by someone who paid even 5 minutes of attention in a 300 level CS course or has a browser and 5 minutes to search!

    1. Re:...getting annoying by Intrinsic · · Score: 1

      Actuallaly I thought that was a intersesting question, somthing I have never thought of before.
      I think that was a worthy post, gets alot of plp talking about like minded ideas.

  213. Go SGI by ctrl · · Score: 0

    SGI IRIX has had this feature (checkpoint/restart) for ages. Precisely beacuse it's used in scientific/numerical computations (because of FP performance) and these computations run for ages.

  214. If it was really important not to lose it... by jelle · · Score: 1

    Then why didn't you and some colleages lift the server and the UPS together (without switching it off), put it in a car, and drive to the nearest place with electricity before the UPS runs out.

    Or if the server was too heavy/big to move, then run to the hardware store and buy a generator

    Other answer:

    If it's a linux machine, then build a mosix cluster, then you can migrate the process to a system that still has power (assuming not all systems in the cluster lost power...)

    --
    --- Hindsight is 20/20, but walking backwards is not the answer.
  215. SOLUTION! Use Windows by Anonymous Coward · · Score: 0

    Windows has a hibernate mode: it dumps memory to disk and reload it after.

    That's what you need. Don't waste time implementing this in unix when you can spend 100$ for windows. I mean computer programmer time is more costly than the license for winbloze

  216. I seem to remember by ThePhantomPiper · · Score: 1
    reading about theorectical OS's that were designed this way...if you flipped the power switch off then on again, you were instantly back where you left off. I think it was in "Accidental Empires" (Robert X. Cringly).

    --

    --
    "I'm not sure exactly what an AS/400 is, however, I'm pretty certain I wouldn't want one up my ass"

  217. Process migration across a heterogeneous network. by jahead · · Score: 1

    There is some great work being done with process migration across heterogeneous machines using checkpointing techniques. If a process is written in a standard language common on many platforms, like C or C++, it's actually quite easy to save the running process data in a text file and then start the same program on another machine. (Even floating-point data can be saved this way, preventing any of the architecture blunders that occur when saving data in binary; i.e. big-endian/little-endian.) There are plenty of libraries already out there that do this, but few of them save the data in with different platforms in mind. Being able to do this type of process migration is great when working with architectures across the internet, or any other heterogeneous network environment. There's some work being done at Arizona State University under the direction of Dr Rida Bazzi to make this automatic across a network. That is, when a process fails on one machine, another machine of a different architecture is able to execute the process from last checkpoint. There's a rough paper at http://www.public.asu.edu/~vidar/fault-tolerance-c heckpointing.html that briefly describes checkpointing for fault tolerance.

  218. Re:This would be useful for more than just blackou by jelle · · Score: 1

    " If you could sleep processes "

    Yes you can, just send it a signal 19 (SIGSTOP), and wake it up with a signal 18 (SIGCONT).

    When you press ctrl-z on a terminal process, the same thing happens (try for example 'find /', then press ctrl-z, you will get the prompt back and 'find' will be sleeping, type 'fg' to wake up find).

    killall -19 myprogram
    killall -18 myprogram

    --
    --- Hindsight is 20/20, but walking backwards is not the answer.
  219. Condor by angel'o'sphere · · Score: 1

    There was once a project called Condor.

    From a american university. Wisc.edu?

    It uses a patched standard clib. It preserves everything including filehandles. It was primaryly intended to let processes migrate in a LAN from machine to machine.

    I've set it up on my campus 10 years ago? Ah, I think it was 1993. It run pretty well on Sun OS 4.3 and Dec Ultrix.

    It tried to finds a idle machine and migrated the process to that one. If the machine got load it suspended the process and freezed it. If there was an other machine, it migrated there, otherwise it sleeped until a machine got available.

    Regards,
    angel'o'sphere

    --
    Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
  220. A case for Python by defile · · Score: 3, Informative

    Python supports a concept that it calls 'pickling' (which is also known as Object Serialization).

    It's extremely easy to save the state of any object along with the objects it references to disk with literally a couple of lines of code (like, 3). You cannot pickle whole processes, but it's effortless to write some skeleton code to resume the process from its last pickle. You can also define specific methods in each object that are called on pickle/unpickle for special cases (restoring network connections, for example).

    The fact that it's an interpreted language shouldn't deter you. Python integrates easily with modules compiled from C, allowing you to accelerate time critical aspects of your code while rapidly developing the not so critical aspects.** Python was designed to solve the problems you're working on.

    Oh, and if you're short on time, don't worry; Python is extremely easy to learn.

    ** As most programmers have found, about 90% of their program's execution is spent in 5% of their code.

    1. Re:A case for Python by scrytch · · Score: 2

      Pickling is really trivial. Hell, I've worked on a pickler for C++. Perl has a plethora of picklers (say that 10 times fast), including Data::Dumper, FreezeThaw, and Storable. Then there's Java's Serializable. It's really not a terribly interesting problem, orthogonal persistence is. Orthogonal persistence where you have an interpreter or other such runtime environent that can be started and stopped and moved around in the meantime, possibly with multiple processes attached to the runtime simultaneously. The "orthogonal" means that you don't do anything special (like pickling) to persist objects and retrieve them from the persistent store, they're just there when you want them, and their lifecycle is indefinite when you create them.

      Any programmable MUD is an example of such orthogonal persistence. Squeak and Self would be others. Personally I wouldn't mind such an environment for Python, but I'm not holding my breath.

      --
      I've finally had it: until slashdot gets article moderation, I am not coming back.
    2. Re:A case for Python by defile · · Score: 2

      Python is by no means unique, heavens no. I just think they're features that are probably most accessible to the masses because of Python's popularity and ease of use.

      I really ought to get my hands on Squeak.

  221. my windows PC does it by Gumber · · Score: 2

    I just hibernate and system state is written to disk.

  222. NCR Towers did that also by Anonymous Coward · · Score: 0

    I remember working on an NCR tower back in 1988-1990. I used to set up the system for the salespeople to demo at a trade show, or a trainer to use in a class. I even logged in all the terminals and had them sitting in different screens. Then I pulled the plug out of the wall and shipped it across the country. A few days later, someone would plug it back in, connect all the terminals and power them up. If you hit a key to refresh the screen on the terminal, it would all be back exactly as I left it.

    I know network connections, etc. would not resume the same way, but if modern systems could do this, think of what a great feature that would be.

    Incidentally, if the system ever locked up, there was a reset switch you could poke with a sharp object, though I don't recall ever having to do that.

  223. VMS checkpoints by Anonymous Coward · · Score: 0

    If you can't spell downtime, use VMS.

    Checkpointing is built in to VMS. Here's a reference :

    VMS High Availability

    Compaq OpenVMS provides integrated and distributed batch processing.
    Batch processing permits non-time-critical applications to be scheduled
    in the background and processed on any of specified sets of available
    systems. OpenVMS also provides for batch restart -- permitting batch jobs
    to checkpoint application data and automatically restart after a system
    shutdown or failure. This gives you a simple way to schedule your
    non-priority tasks to gather available resources across a collection of
    nodes, or to schedule high-priority tasks transparently and
    automatically, without regard for which specific nodes are available
    when the job runs.

  224. Dupliate topic! by Mr-Pope · · Score: 1

    I thought that this is what this article was about:
    http://slashdot.org/article.pl?sid=02/01/17/1823 23 3&mode=nested

    And to be more specific, this link:
    http://www.muropaketti.com/artikkelit/cpu/northw oo d2200/ln2/

    Sounds like computer Cryogenics to me!

    --
    "The only way to learn a new programming language is by writing programs in it." - Brian Kernighan
  225. not being a kernel expert but.. by jrexilius · · Score: 0

    this sounds like a kernel task, and I imagine it would re-use aspects of the sleep state. I could see this as simply a hack in the kernel that would accept a particular signal and place the process into a sleep mode but rather than writing to swap, writing to a special partition. this sounds reasonable but i have not played much with the kernel and only written a few programs that use signals.. This signal could be triggered by nut or some other UPS monitoring device or syslog or perhaps just set to run fairly often (like a journaling FS except for physical memory).. hmmm.. maybe I am just blathering..

  226. hibernate on my laptop by Anonymous Coward · · Score: 0

    Done this.

    I just thawed my old laptop (HP4150), and guess what, it was running Debian, had my last irc session (oh immediatly was on the network as if it never left), on the windowmanager wmnet thing, i can see my last net connection slowly going by, as if it was just a min ago that I last used it, the movie I was playing on it (heh crouching tiger ) was still playing in mplyaer.. it was cool.

    OH, I put it in hibernate sometime around summer of 2001!!

    This is what i love about laptops.

  227. Intergrief just plain sucks by Anonymous Coward · · Score: 0

    no text needed

  228. Classic boots on log in, not start up by Anonymous Coward · · Score: 0

    The problem with auto loading Classic is that it can only be set to start up on log in, as the classic environment runs as an application too, so is shut down when you log out. You can set Classic to sleep after a period of inactivity though (check the 'Advanced options" in Classic prefs.

    My suggestion though, would be to try and replace the classic apps you use with OSX alternatives; the number of Carbon and Cocoa (better) increases every day. I hardly boot Classic any more, let alone restart into OS9.

  229. Uh huh... by global_diffusion · · Score: 2, Funny

    GNU Emacs basically does this to reduce initialization times.

    I heard about this. But, my dear boy, I do believe that VI does this better and with more cryptic keyboard commands.

  230. Re:Per Process: SIG/PPID, etc by Anonymous Coward · · Score: 0

    The complicated part of getting this to work on a per process basis has to do with maintaining a process' notions about parentage, signaling relationships etc.

    In other words, simply writing a core file and restoring it isn't enough. If the process had any children, or if any other process on the system "cared" about your PID, then those relationships would have to be restored as well. For some very simple programs it might not be necessary, but to
    generalize it must be done. These same issues
    come up when talking about process migration and close coupled clustering (like Compaq/DEC's SSI -- single system image work).

  231. Tadpole had this... by clheiny · · Score: 1
    Something basically like what you are describing was (is?) available on the Tadpole on their products. I used it on their SparcBook 3 series of Sparc based laptops (pretty cool things in their own right, but really expensive). They ran modified versions of SunOs 4.x and Solaris 2.x - I think (but am not certain) that the current Solaris suspend/resume is based on this.

    Basically, when you wanted to turn off the system, you would hit the Suspend button on the control panel (or an appropriate set of hotkeys). Your RAM image would then be flushed to a special disk partition, and the system would be powered down. At the next power on time, this partition was checked to determine if it was valid (I forget what the cookies were), and if it was, the boot loader would simply copy the contents of that partition to RAM.

    It worked pretty well for me - I went months at a time of several power-on/power-off cycles per day without a real reboot.

    --
    Racing is an addiction that makes heroin look like a vague hankering for something crunchy.
  232. I hate to say it but... by hateddamntruth · · Score: 0

    Windows ME does this using the "Hibernate" feature.

  233. VMWare by Anonymous Coward · · Score: 0

    I've already done this feat with Linux in the past. I run it under VMWare and hit the suspend button. Works like a charm.

  234. EROS is an entire operating system based on it by jeske · · Score: 2, Informative
    EROS is a research operating system built around the idea of making all processes persistant at all times.

    EROS' predecessor, KeyKOS, made waves at USENIX when they did a demo of a UNIX system + Xwindows which would instantly restore the running state of all software when rebooted. It was basically a UNIX port to KeyKOS, and since everything in KeyKOS was persistant, so was everything in the UNIX.

    One interesting caviat with this type of OS is that you really need to use ECC memory, because bit errors can get saved to disk and propagated forever!

  235. My thoughts on what would be needed. by Vulture_ · · Score: 2, Interesting
    I know this has already been done, but I thought I'd throw in my understanding of what would have to happen:
    • The process' core would need to be dumped. This should be fairly straightforward, since some of them do this a lot already... ;)
    • The process' registers (main CPU registers, FP registers, and any other registers that might exist on whatever exotic arch you use) would need to be saved. This is already done by the kernel for context switching to other processes. Gdb also can fetch and change these, so it can be done from userland.
    • Last but not least, all of the kernel level state of the process would need to be saved. That involves saving:
      • The signal handler table.
      • The pending signals. (Since the process hasn't handled its pending signals yet, it needs to handle them at some point in the future.)
      • The state of whatever syscall, if any, the process was in at the time of freezing. (Or you can set errno = EINTR on most syscalls, if this isn't possible.) This would be rather interesting to implement -- what if you're using a different kernel when you thaw? You can't just save the pc of the syscall then...
      • The file descriptor table. This includes network sockets, which would probably have to be closed (set errno = EPIPE and send SIGPIPE on next read from a thusly closed socket?), for obvious reasons.
      • The System V IPC state. This means message queues, shared memory, and semaphores, all of which would have to somehow be recreated.
      • Any child processes, most likely.
      • More interestingly, any threads, which might be hard to tell apart from ordinary processes since they are ordinary processes with some exceptions.
      • Everything else...

    As you can see, freezing and thawing UNIX processes could get quite nightmarish if you account for all of the possibilities. (Most processes don't use SysV IPC, for instance.) Even the most (seemingly) trivial of syscalls would need to be modified (all socket functions, for instance).

    Note that it's a lot easier to freeze and thaw a virtual machine, because it's so much more self-contained -- all you need to save then is:

    • Core.
    • Registers.
    • The state of any simulated hardware devices (virtual screen, for instance).
    --

    The only way the typical /.er can pick up a chick is with a forklift. -- AC

  236. the answer by stinky+wizzleteats · · Score: 0, Offtopic

    was 42. The question is suspected to be 6 X 9.

    Damn Vogons.

    1. Re:the answer by Anonymous Coward · · Score: 0

      sheesh. i hope there's some Douglas Adams fans in the metamods.

  237. MacOnLinux also by mbrubeck · · Score: 2

    MacOnLinux has the same feature, for those of us not in Intel land.

  238. DMCA by Rogain · · Score: 1

    Perhaps, but couldn't that file be redistributed? And thus apple would be violating its own Intellectual Property Rights, and thus would enter a death spiral, as the DMCA forced apple lawyers sue eachother, over and over again, until all corporate assets have been converted into lawyers fees.

    --
    The current Slashdot moderation system is made by gay communists!
  239. ^Z by Rogain · · Score: 0, Offtopic

    But seriously....

    --
    The current Slashdot moderation system is made by gay communists!
  240. Doesn't anybody build from source anymore? by jflorey666 · · Score: 1
    It's called unexec(). It's been around for a decade and a half; just google for it.

    To be fair, it's only 98% of what you need. The other 1.8% you can get by poking around in /proc/{pid} and reading the state of open file descriptors.

    Golly. And people wonder why the bubble finally burst.

  241. freezing processes NOT whole systems by Anonymous Coward · · Score: 0

    I think that the guys talking about windows [any flavor] or macosx miss the point. Whole systems freeze/restore might be fine on small user boxes/laptops but it is quite unlikely that larger clusters should do that. It would be nice if User [!!!] can set shell/environment on a per process basis so that in case
    the machine(s) go down his/her process-state gets
    saved imagine a parallel program running on 10 nodes and one node has to be replaced [because its network card is faulty or somesuch].
    I would assume that this is possible given that
    it works for the whole OS.

    Peter

  242. which you've copied and pasted from... by Anonymous Coward · · Score: 0
  243. large cache hard drive with its own UPS by aaron_pet · · Score: 1

    Howdy, with ram prices being so low... why don't we have 128+MiB cache harddrives with thier own batteries and a bit of a processor on it that will assure the data is writen even when the rest of the computer is turned off.

    --
    Please use [ informative / summarizing ] SUBJECT LINES
    Flame me here
  244. SQL-like Rollback by mattr · · Score: 2

    Suse on my Dell Inspiron 7.5K used to work with the suspend key, but no longer (X just hangs).
    But ancient software is involved.

    That said, rather than hibernation I'd prefer a software-UPS or time-rollback widget. How viable would it be to keep a very high frequency incremental save of state (even just the contents of a limited number of folders would be useful)?

    It would be useful to be able to send your machine backwards in time without requiring everything to be in a database or versioning system that requires explicit saves. I'd like to be able to remove the effects of every command in the history of all shells in reverse, in the right order, and have high-granularity access to previous states of a filesystem.

    If I could do that for all the relevant accounts on various machines it would be like never having to worry. I could leave the desk when I want to, kick the power cord or make meatheadded mistakes, and could keep a less paranoid number of full backups. I'd be worried about the life of my hard disk though. Already exists?

  245. Oh, the memories by Anonymous Coward · · Score: 0

    I'm sure most (frustrated :-)) IBM Aptiva owners are familiar with this concept. 'Back in the days' it was known as 'Rapid Resume', where you could just push the power button - and the pc would power off completely, and then when you turned it on again, it would be where you left it.
    Ahhh, trusty old 2144-z30

  246. hibase.cs.hut.fi by Anonymous Coward · · Score: 0
    Looks like they've got a programming language which is automatically persistent (recoverable
    from system crashes). You just write a program
    in their language and presto, you have a persistent application.


    hibase.cs.hut.fi

  247. How do you nick that? Pr0n? Well, it's just EROS. by Anonymous Coward · · Score: 0
    EROS stands for Extremely Reliable Operating System. It is very interestng in many ways and include checkpointing from (AFAIK) the very beginning.

    Current status is not very advanced, but...

    The main advantage of EROS is that it was designed with provability and reasoning in mind. I was fascinated about SPIN project, where you could put trusted components in the kernel space. For that, they should be written the way compiler could prove their innocency, and Modula 3 type system ensure that. If you look at EROS papers you'll see that authors look at the OS from the OS-as-programming-language point of view, which is very similar with SPIN.

    Take a look at EROS, delight (erect, excite) your mind. ;)

  248. Clever workaround by Anonymous Coward · · Score: 0

    The clever workaround to this that I used was to scavange all the UPS' I could, and then plug the UPS of the critical machine into other UPS', until
    the power came back on. It took 5 of them to do the job, but the machine never lost power.

    Of course, what you reall want is "checkpoint and restart", which is about as old as computing itself.

    -- Terry

  249. Chaep checkpoint restart by Anonymous Coward · · Score: 0

    1) Install TeX to get "undump".

    2) man gcore

    3) man undump

    -- Terry

  250. Take it one step further by Allnighterking · · Score: 2

    What I'd like would really be one step further in the chain. Something like my palm or the old Cannon Cat. Turn it off, come back a week, month or year later and voila. You are right back at the same point you left, as if you never turned it off. The basics as I see it would be that ram gets written to swap as an image, (which is what the Cannon Cat did.) Then when your restart the box by tuning it on, ram gets re-initialized from the swap file back to the state it was in before power off. The other option would mean adding a small battery pack to a desktop. If you hit the power button on a box or pull the tail from the wall ram is maintained by the battery until you re-power the box. (or the battery finally goes south.) As I see it there shouldn't be any reason why a box once run through startup shouldn't be able to maintain it's running state almost indefinitly. In fact if you could get Linux to do this one thing..... it would be on desktops so fast you wouldn't believe it. Unless you change hardware what is the diffence that occurs that requires the full init sequence anyway? The Green Peacers would love it because people wouldn't mind turning off there comp since it's "instantly on". The only down side would be that you wouldn't want to stay logged in, but then what's the diff between being logged in with the monitor off and being logged in with an instant on feature? Course it would mean uptimes in years instead of days.....

    --

    I'm sorry, I'm to tired to be witty at the moment so this message will have to do.

    1. Re:Take it one step further by BCoates · · Score: 1

      Hibernation, you can do it on Windows 2000. When you can get it to work, hibernate (swap everything to disk and power down) and de-hibernate (reloads the image, give the password of the logged in user, right back where you left off) are much faster than shutdown/startup, particularly since you don't have to restart all those applications.

      It's a pain to get working and keep working, though, since (iirc) all hardware and all drivers must allow it, and windows is very unhelpful about which piece of hardware/driver is refusing to allow hibernate.

      --
      Benjamin Coates

  251. Journalizing by Glanz · · Score: 1

    If you use the ext3 FS, and a few other Journaling filesystems under Linux, it can be configured to journal data as well as metadata. This may also work under reiserfs if cache is closed down. Ext3, however, does not require this and works fine for that. Power outages, or red-button reboots are therefore no problem. The OS simply picks up where it left off upon reboot.

    --
    Rien n'est plus beau que le creux du 0.
  252. Linux patch: swsusp by styx_sd · · Score: 1

    Take a look at the linux-patch "swsusp", it might be something like what you're looking for.

  253. FOLK has a kernel patch for soft suspend by Anonymous Coward · · Score: 0

    http://folk.sourceforge.net/. that's it

  254. Uhm, am I missing something here, or...? by mixter · · Score: 1

    Or what would speak against suspending the
    process with kill -STOP and reviving
    it later with kill -CONT ? You can
    keep connections alive for ca. 360 seconds.
    I think some *nix flavors will also have the kernel keep
    the connection of suspended processes alive (right?).
    A small caveat is that you need to keep the tty open or the standard input/output file descriptors will be lost. :( And you just can't reboot in between.. but hey it's unix, not windows, you don't have to :D

  255. Software Suspend (for Linux) by MikeBabcock · · Score: 2

    Check out the software suspend patch for Linux. It allows the system to be suspended by SysRq-D (or shutdown -z) into swap space and resumed (or not) at the next reboot.

    --
    - Michael T. Babcock (Yes, I blog)
  256. Suspend by ThePlumber2 · · Score: 1

    Use APMD.... It can be used on regula machines also.

    It features the "sleep" mode and "suspend" mode and will swap all the system info to the HD.

    I'm sure it has already be said, but I like to be redundant P-)

    --
    Thanks, Steve
  257. SGI IRIX has this built-in: CPR by Anonymous Coward · · Score: 0
  258. There is another solution. by LazLong · · Score: 1

    I work for a large gov't lab where they run calculations for months on the fastest (currently) super computer, ASCI White. People who run codes for any period of time (generally greater than a day or two) write intermediate results a specified intervals so that they can resume in the case of an interruption. Seems like a good general solution that is OS feature independant (disclaimer for nitpicking flamming morons: This is with the assumption that the OS one is using has I/O capabilities).

  259. The continuation by n2kra · · Score: 1
    I never really understood Scheme (and CL ?) continuations until:

    An Intro to Scheme and its Implementation - Recursion in Scheme

    Of course this is inside a lisp process, BUT if Lisp is your OS...

  260. Think about the mice by tve · · Score: 1

    Due to a recent power outage, I've had to shut down a server running a process that had been running for ages calculating something. The job it was doing would have been done in a few days, I think, but I had to shut it down before the UPS ran out of juice.

    That's nothing! I once had a computer that was demolished to make way for a hyperspacial bypass seconds before finishing the program it had been running for millions of years.

    --

    If there is hope, it lies in the trolls.
  261. DIY by james(honest) · · Score: 1
    I once had to write a program that had many hours of computation. I had it use memory mapped files. It was written so that if it was interrupted, or more likely, if it crashed, it would restart where it left off. Very useful if the bit you are trying to debug only occurs after an hour of previous computation.
    Really if you write a program that is going to take more than an hour, you really should spend a few minutes doing it properly.

    Or you could use any computer that can hibernate. I do find it highly amusing that in response to the "use XP" messgaes, the linux community replies by saying "Oh thats not what he meant at all". It isnt? He said he had to shut down his machine because of a power failure. He didnt say "I want to shut down this one single process". Hibernation would quite clearly have satisfied the original post. But be that as it may, there seem to be quite a few non techies happy to jump about and talk about suspending a single process, without any thought to things like file handle, inter process communications, access to devices. I'm sure someone could implement a "hack" that would occasionally manage to save a process, but it would not be reliable enough to risk using. I would agree with the posters who suggest that creating a protocol through which a program can participate in suspending itself would be ideal, and if it can handle being restarted on another machine then perhaps we have moved on to talking about agents...

  262. Windoze cryogenics by belg4mit · · Score: 1

    [http://download.cnet.com/downloads/0-10091-100-40 08596.html?tag=st.dl.10001-103-3.lst-7-25.4008596| Memory Dumper]

    --
    Were that I say, pancakes?