Logging Unexpected Shutdowns/Crashes w/ Linux?

← Back to Stories (view on slashdot.org)

Logging Unexpected Shutdowns/Crashes w/ Linux?

Posted by Cliff on Sunday September 14, 2003 @04:06PM from the finding-evidence-of-the-problem dept.

sweede asks: "I have a dedicated server that seems to reboot more often than it should. In Windows 2000/XP (maybe NT4.0?), if your computer or server crashes it will leave an event message in the Event Viewer for you to review on what went wrong. Is it possible to do something similar in Linux? Where a power outage or an unexpected kernel panic will leave a message in /var/log/event (or whatever) Searching Google for 'kernel trapping' doesn't give me a whole lot of info on the subject."

6 of 86 comments (clear)

Min score:

Reason:

Sort:

Re:I'm pretty sure.. by Chester+K · 2003-09-14 16:40 · Score: 3, Interesting

That the reason Linux doenst write anything to the HD after Panic si so that it doesnt mangle/destroy the FS.

Why not reserve a set place on the hard drive and write out error trap information there? There's no reason the filesystem needs to be involved at all. I'm going to guess that's what Windows does.

--

NO CARRIER
Re:Flag it. by Creepy+Crawler · 2003-09-14 16:42 · Score: 3, Interesting

You fail to understand what happens to create the "Dirty Bit".

1: System starts up (say clean).
2: It marks a bit on the partition that system has been started up.
3: Usage Usage Usage
4: Send shutdown
5: System umounts cleanly. Undoes "dirty bit"
6: Power == 0

On a dirty FS, stage #5 is never hit so when system comes back on, it checks the bit and detects unclean shutdown. The bit is never wrote during the unclean shutdown.

In the similar problem, I see problems when NTkern crashes. How exactly does it manage to:

1: Read the partitiom
2: Read the program on the partition
3: Run the insert log program to add log entry
4: Still have the "blue screen"

I smell nasty data corruption waiting to happen. After all, if you cant guarantee the state of the kernel, does it really justify reading, writing, and executing on a crashed kernel????
--
- Mod parent up! by Anonymous Coward (Score:1) Thurs, Nov 31, @13:37
Re:I'm pretty sure.. by Creepy+Crawler · 2003-09-14 16:48 · Score: 3, Interesting

OK. Then how do you guarantee the state of the kernel? If you use bios calls, it screws up the memmap even more. Thats assuming you can even pass something like that.

100$ question: How do you break out of code inserted that might have had a bug? How do you determine what code had that bug?

Answer those, and then I'll trust Write_after_system_crash api
--
- Mod parent up! by Anonymous Coward (Score:1) Thurs, Nov 31, @13:37
Re:Other OSes by FueledByRamen · 2003-09-14 17:33 · Score: 3, Interesting

f course, usually you know there was a power failure because your UPS told you so.... I did have one case where we had a very brief outage (or maybe just a brownout). Every machine in the building had rebooted.... except one. That RS/6000 had an eerie log message like "power failure detected". And no, it was not on a UPS. I was rather impressed.
I had a similar interesting experience with an SGI Indy (Irix 6.5.13, or thereabouts). I was booting it up after it'd been sitting for a while, just to see what I had running on there. While it was going, and I was fumbling around for an ethernet cable for it (it takes several minutes at boot to wait for a cable instead of noting its absence and moving on), I kicked the power strip that it was on and the plug wiggled around in the wall socket. I heard a spark jump in the socket, and the monitor it was on (Dell/Sony Trinitron 19") went to half-height mode for a few seconds, spitting and clicking, turning the screen on and off and varying the vertical height randomly.

I expected the Indy to kernel panic or turn off. Instead, below the complaints about the missing ethernet cable ("en0: link carrier not detected" or similar), there was a lone status message: "Power failure detected."

No UPS, no power saving devices of any kind, only the filter caps in the power supply between the logic board and the unreliable, crufty power system of a 70 year old house at the mercy of a power strip first used on my (brand new at the time) Atari 800. The other computer on the power strip (350 P2 running RH 7.1) rebooted hard, right in the middle of heavy FS activity. I had to hit the reset button before it would come back up again, too - the brownout hung the POST.

--
Every cloud has a silver lining (except for the mushroom shaped ones, which have a lining of Iridium & Strontium 90)
Re:Other OSes by kinema · 2003-09-14 18:24 · Score: 3, Interesting

I wonder if /dev/nvram (the small amount of NVRAM availible on the RTC) is large enough to store such a dump.
"Linux crashes" are probably contact failure. by Futurepower(R) · 2003-09-14 19:37 · Score: 3, Interesting

As others have said, the "Linux crash" is probably hardware failure.

The most common cause of serious failure, if the software has been installed correctly and tested, is bad contacts. To fix the problem, just loosen the screws that hold the adapter cards, pull the cards out about 1 millimeter or 1/32 of an inch, push the cards back in fully, and re-tighten the screws. Also, pull all connectors off a similar amount, and push them back on. Do the same with the memory modules. That's all.

The scraping caused by moving the contact points a tiny amount is actually very violent on a micro scale. The scraping removes oxide that causes a contact to lose electrical conduction.

This is reliable information. I've been selling and occasionally repairing PCs since before IBM sold PCs, back in the days when personal computers cost $2300, had two diskette drives and no hard drive, and ran the CP/M operating system.

My guess is that, if you had a penny for every real crash of a stable distribution of Linux, after a few years you might still have to borrow money from your little brother to buy a piece of bubble gum.