Slashdot Mirror


Logging Unexpected Shutdowns/Crashes w/ Linux?

sweede asks: "I have a dedicated server that seems to reboot more often than it should. In Windows 2000/XP (maybe NT4.0?), if your computer or server crashes it will leave an event message in the Event Viewer for you to review on what went wrong. Is it possible to do something similar in Linux? Where a power outage or an unexpected kernel panic will leave a message in /var/log/event (or whatever) Searching Google for 'kernel trapping' doesn't give me a whole lot of info on the subject."

8 of 86 comments (clear)

  1. Re:Kernel Panic on Linux? Sounds like hardware pro by Drakon · · Score: 4, Funny

    If you run 2.6.0-test6 with -mm15 and some home brewed patches, you can have crashes without hardware failure

    (one who speaks from experiance) :-)

  2. Other OSes by menscher · · Score: 5, Informative
    This will probably be modded down as flame bait, but I can't resist pointing out what some other OSes have done when crashing:

    IRIX will core dump to the swap partition. On the next boot it analyzes this core file, which includes various system logs, etc, and saves useful output in /var/adm/crash. You know you've done a good job when the kernel panic causes a panic, called a double panic. I used to be able to trigger those at will. Hrmm, I should test that on the current release.

    AIX summarizes the likely causes of failure (power failure, someone pressed the power switch, or power supply died, etc). I've seen (but do not personally use) a similar thing with IRIX that actually assigns a percentage confidence level to its guess.

    Of course, usually you know there was a power failure because your UPS told you so.... I did have one case where we had a very brief outage (or maybe just a brownout). Every machine in the building had rebooted.... except one. That RS/6000 had an eerie log message like "power failure detected". And no, it was not on a UPS. I was rather impressed.

    Sadly, I don't know how to get any useful information out of linux. And don't give me crap about it never crashing. I can prove otherwise. Too bad I can't figure out why.... Maybe a kernel developer will read this and copy some ideas from the commercial Unix vendors.

    1. Re:Other OSes by anthony_dipierro · · Score: 4, Informative

      IRIX will core dump to the swap partition.

      FreeBSD does this. HP/UX does this. I always assumed Linux did it too, it just wasn't turned on by default. I guess I was wrong.

      As a side note, my first job out of college was to analyze core dumps from HP/UX. There's an awful lot you can learn from these things. Not just stack traces, the entire memory of the system is contained in the dump. It's time consuming, but a large portion of the time you can find out *exactly* what went wrong.

  3. Try the Linux Kernel Crash Dump (LKCD) patches by bigsteve@dstc · · Score: 5, Informative

    If you are adventurous, you could try applying the LKCD patches to your kernel. Start looking here

  4. Re:Easy... by Big+Jason · · Score: 4, Funny

    Not to be confused with last | reboot, which I've done before. Doh!

  5. Here's how: by samjam · · Score: 4, Informative

    1) First disable console blanking, that way when you get to the crashed box and plug the monitor in you can see the kernel panic message. /usr/sbin/setterm -blank 0 -powersave off -powerdown 0

    We had some early kernel 2.4 redhat boxes crashing like the dickens for a while, it was a kernel problem and only when it happened on a local machine under our eyes did we get to realise what had happened.

    2) Network syslog;
    If you syslog to a central machine not only does it make error spotting centralised and easier but it means you have the last gasps of the crashed machine logged on a machine that is still up.

    Sam

  6. Yes! by twistedcubic · · Score: 4, Funny


    Is it possible to do something similar in Linux?

    Yeah, but we have to wait until our SCO insider funnels us the code.

  7. Some ideas by Gudlyf · · Score: 4, Informative
    Mission Critical Linux does this.

    There's also the LKCD (Linux Kernel Crash Dumps) package:

    KCD contains kernel and user level code designed to:

    • Save the kernel memory image when the system dies due to a software failure;
    • Recover the kernel memory image when the system is rebooted;
    • Analyze the memory image to determine what happened when the failure occurred.
    --
    Trolls lurk everywhere. Mod them down.