SGI announces Linux Kernel Crash Dumps (LKCD)

So, the value is... by dmercer · 1999-11-05 06:56 · Score: 3

From some of the posts above, I gather that there is some confusion about the significance of the functionality being provided by SGI with LKCD.

Yes, every reasonable operating system can be configured to save the core files resultant from a kernel panic to swap, and yes, many provide excellent tools for conducting a post-mortem analysis of the image to diagnose what caused it to croak. But in the past, with the notable exception of IRIX, this process required a fairly intimate knowledge of the operating system and even the underlying hardware, and was considered something of a black art. An excellent book on core dump analysis issues/procedures is 'PANIC!' Unix System Crash Dump Analysis, published by Sunsoft. IRIX, and now Linux when properly configured, automatically conducts the crash dump analysis upon re-entering multi-user, saving a legible and comprehensible report detailing what was going on at the time of the crash and providing a suggestion as to the cause.

This facility can be an excellent way of quickly tracking down the cause of the panic, or at least determining if the problem lay in hardware or software. Below are three examples of some recent reports generated at our site:

Sample 1

Sample 2

Sample 3

While this utility is no replacement for an experienced sysadmin and a debugger when it comes to deciphering the cause of failure in complex systems (especially SMP), it will likely be a boon to the hundreds of thousands of Linux admins supporting small workgroup servers and workstations. And yes, Linux is stable.. but c'mon: kernels panic.

Did you GPL the code to crash the kernel? by CraigMcPherson · 1999-11-05 07:39 · Score: 3

You should have considered placing it under a BSD-style license. Microsoft's feelings are going to be hurt over the fact that they can't incorporate it into Windows 2000.

Redundant Kernels by cdlu · 1999-11-05 03:19 · Score: 2

I have been thinking about a solution to kernel panics and no-reboot kernel upgrades for a while, and here is the only thing I have come up with that seems viable:

We have redundant power supplies, hard drives, and many other pieces of hardware. I am thinking it may be good for developpers, at any rate, to use redundant kernels. Kernel 1 dies, kernel 2 realises this and kills kernel 1 and takes over the system. Interrupt in service: a few clock cycles. Perhaps a new runlevel should be implemented into the linux kernel...runlevel 7, which would be against the POSIX standard I think, not sure, but would allow a condition in which the kernel is replacing itself in memory, by having a redundant kernel take over while one is being replaced in memory, and the second kernel handing off resources to the new primary kernel when it is ready, returning to the previous runlevel.

The long and the short of what I am saying is that there should be a second kernel in memory at all times ready to take over at any time, but programmed to not run until the first kernel dies or is being upgraded.

The disadvantage: it starts to consume extra memory resources, and process table entries, and will take a long time to perfect.

What do you think?

--
OFTC: By the community, for the community

Re:Redundant Kernels by jd · 1999-11-05 03:29 · Score: 2

Not really. What you're describing could be modelled as two virtual machines, each running Linux and High Availability software. If one kernel dies, the processes migrate to the second virtual machine.
This -could- be done with only minimal enhancements to Linux and the existing HA software - the support of two (or more) virtual machines within one (or more) physical machines.
Actually, this would go beyond crash recovery, as you could use this to do better scaling of multi-processor/multi-machine environments. Instead of trying to map N processes onto M components, you're only mapping N processes onto N virtual machines, and then N virtual machines onto M components. Because you already know and understand virtual machines, that's a much easier problem to solve.

--
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
Re:Redundant Kernels by Salamander · 1999-11-05 21:27 · Score: 2

>We have redundant power supplies, hard drives, and many other pieces of hardware. I am thinking it may be good for developpers, at any rate, to use redundant kernels.
>...
>Interrupt in service: a few clock cycles

Sorry, but this doesn't work. You'd have to replicate the entire kernel address space including that used by drivers and on behalf of user programs for it to be effective. Many crashes actually result from corruption of some part of kernel memory, so if the two kernels share data they'd both crash at once. In addition, because the in-kernel data structures kept on their behalf might be different (if the two kernels were precisely identical they'd also crash in tandem) the user programs would have to be duplicated too. Now you've halved your memory and CPU resources, plus you're effectively doing a context switch "every few instructions" to go from one kernel to the other. Your performance is going to be totally abysmal.

The solution? Do what fault-tolerant systems already do and replicte the hardware as well. Been there, done that, works OK but gives lousy performance/dollar compared to non-FT alternatives. If you don't want to pay that premium you can go with a clustered highly available system such as the ones I once helped develop. Unlike an FT system, an HA system will allow an interruption when a component fails, but the duration of that interruption will be very short relative to a "vanilla" non-FT non-HA system. Also, in the absence of failures the component nodes of an HA cluster (a well-designed one, anyway) are able to process their own independent workloads instead of sitting around idle or duplicating each other's work.

--
Slashdot - News for Herds. Stuff that Splatters.

Re:Is this a new thing or just new to SGI? by DragonHawk · 1999-11-05 03:19 · Score: 2

This is not a core dump of a running application, but rather, a core dump of the entire running system. If a kernel failure occurs, this patch will dump the contents of system to memory to disk, allowing you to analyize system state from just before when the crash occured.

This would be very useful, for example, when debugging a device driver. It is not something the end-user, or even system administrator, is likely to use. It is for the kernel developer.

Other OSes (Sun Solaris, SGI IRIX, Novell Netware, to name a few) have had this capability, but Linux has not. Linux has traditionally dumped a summary of the kernel state to the screen, but that is (1) tedious to copy down by hand (which you have to, since the system is dead), and (2) not as complete as an entire system image is.

--

dragonhawk@iname.microsoft.com
I do not like Microsoft. Remove them from my email address.

Re:Netware by Anonymous Coward · 1999-11-05 03:20 · Score: 2

WinDbg, while probably not redistributable, is a free download from MS. It can read NT kernel dumps. Try here or here. Unfortunately, they're already orienting these places to Win2k.

Re:sun suing by Guy+Harris · 1999-11-05 03:21 · Score: 2

IRIX machines have been doing this for quite a while....

IRIX isn't Sun's UNIX, it's SGI's UNIX. They're unlikely to sue themselves for stealing an idea from IRIX....

BSD, as others have noted, has had it for ages; many other flavors of UNIX probably got the idea (and, in some if not all cases, the code) from BSD.

Re: NT Memory dump by Raetsel · 1999-11-05 03:26 · Score: 2

Yes, NT has a "Write debugging information to: (generally %SystemRoot%\MEMORY.DMP) option. Then you have a copy of your last-state memory in this massive file. This is not news

What jafac was saying is that Microsoft does not give you or offer any low-cost, distributable tools for making sense out of this massive pile of arcane charachters.

Got 128 MB of system memory on a NT workstation? 512 MB on a server? Hope you've got Einstein and a couple years to sort through the thing by hand to find your problem!!

And UnknownSoldier has another very good point: The analysis tools are not cheap, and you can't share!

C'mon Microsoft, didn't you learn anything in kindergarden?

--

"...America's great minds of today, teaching America's great minds of tomorrow. Poor bastards." -- A Beautiful Min

Oops tracing is fun! by Kaz+Kylheku · 1999-11-05 03:55 · Score: 2

A good hacker should be able to do with just a register dump, stack trace and some program text surrounding the instruction pointer where things went belly up.

Hacking the kernel is supposed to be hard and tracing crashes given minimal information is a big part of the fun and attraction of ``iron man'' programming.

Then again, having a full dump doesn't necessarily make debugging that much easier. It's an incremental improvement over oops text.

Here is the real advantage: a dump is good from the point of view of users who need to report crashes to developers. I think that even a hack to get oops text (rather than a full dump) written to a partition would be better than asking the poor user to copy the oops text appearing on the frozen console down on a piece of paper! Forget it!

Re:Is this a new thing or just new to SGI? by Eric+Smith · 1999-11-05 14:39 · Score: 2

Part of the problem with doing it under Linux is where do you dump to? And how do you know the location which the kernel points to for it to dump to isn't corrupted?

This problem isn't unique to Linux. I don't know what SGI has done (either for Linux or IRIX), but an obvious approach is to use a fixed location on the disk, such as the tail end of logical cylinder 0 (normally unused), or to designate a special "core dump" partition.

Most Unices (eg Solaris, Irix, HP/UX) have some hardware support to help with doing this AFAIK,

Nope. It's all done in software. However, I could imagine that they might possibly have disk drivers with a special core-dump mode that is less dependent on the rest of the kernel, i.e., perhaps it would use polling rather than interrupts. On the other hand, maybe they just assume that the system is working well enough that the disk driver is OK. Often a panic that is caused in some other part of the system won't hurt the disk driver (although the file system code is more delicate, so a kernel dump ideally should bypass the FS).

but with the wide variety of x86 hardware, not to mention all the other platforms Linux runs on, that's not a wonderful option in Linux.

Even on other OSes the core dump doesn't always work. If things get sufficiently screwed up, the system can't write to the disk. But in my experience on other systems, core dumps work most of the time, which makes them quite worthwhile.

I worked at a router company for five years (and am going to start a new job at another router company on Monday). Our routers could core dump either to floppy or across ethernet to a TFTP server. We found core dumps to be very useful, both during development, and for analyzing failures at customer sites (which we obviously tried hard to avoid).

Some of the posters here seemed to question the utility of kernel core dumps, and point out that their kernel doesn't crash. While those people might not need the core dump feature, perhaps they should appreciate that it might help the developers maintain a high standard of quality going forward. As the Linux kernel continues to support an every increasing number of device types, expansion busses (such as 1394 and USB), file systems, etc., it will become correspondingly more difficult to keep it robust, and every tool that can be made available to the developers to assist with this should be welcomed.

Re:'bad programs' by Eric+Smith · 1999-11-05 14:42 · Score: 2

And if a problem in a userland program causes the kernel to crash, not only is the userland program possibly broken, but the kernel is definitely broken. This core dump feature will help debug such problems.

I've personally never seen a userland program crash the Linux kernel. The closest I've come is having bugs in the X server lock up the keyboard and display, but the machine was still running fine in all other regards, and I was able to telnet in and initiate a clean reboot.

Re:This implementation is much less than what BSD by Eric+Smith · 1999-11-05 14:46 · Score: 2

This is *crazy*! That's like, uhm, like sort of a hack perpetrated by someone who was in a hurry and didn't know about prior art.

Hey, give them a break. This is just the first release; if people like it and encourage them (or work on it themselves), it will undoubtedly get better with time. After all, ROM wasn't built in a day. :-)

Linux kernel tools need work by tilly · 1999-11-05 04:04 · Score: 2

This is a good thing, but it is part of a more general problem.

And that problem is that we accept tools for Linux development that are distinctly sub par. There is a lot that could, and should, be done.

I would say more, but I cannot possibly say it better than this rant does.

Cheers,
Ben

PS The Microsoft program works right and has a bad interface, the Linux program has a nice interface but sucks! Whodathunkit? (Read the link.)

--
My usual seat in the cluetrain is at A HREF="http://pub4.ezboard.com/biwethey.ht

Not JUST a core dumper by grumpy_geek · 1999-11-05 04:13 · Score: 2

Having done just a very quick glance over the specs I may be wrong, but I believe they are doing what they have been doing on the SGI for awhile. When a SGI running a newer flavor of IRIX does a system panic (SCSI, memory, whatever) it dumps a core out. Dumping this file is not for the drivespace week, if you have half a gig or ram you have a half a gig core file, but the beauty of this is it then automatically examines the core file and tries to figure out what killed it, you don't have to go in and run the debugger yourself.

Having the machine tell you what memory page you were at when it took a dive makes life much nicer for the harried admin; of course if you want to dig through a core at a later time with your debugger you can but it gives you a good starting point, and tends to make tracking things down much quicker since you have a guess as to where the problem resides. Having your box tell you that you had a memory error in SIM 3 bringing the box down, having analysed the core file before you even have a chance to fire up your debugger, is a pretty nice thing.

Of course this is dependant upon my assumption that it works in the same kind of fashion as Irix (which it seems to).

Re:Is this a new thing or just new to SGI? by Unknwn · 1999-11-05 04:39 · Score: 2

Part of the problem with doing it under Linux is where do you dump to? And how do you know the location which the kernel points to for it to dump to isn't corrupted? Most Unices (eg Solaris, Irix, HP/UX) have some hardware support to help with doing this AFAIK, but with the wide variety of x86 hardware, not to mention all the other platforms Linux runs on, that's not a wonderful option in Linux.

With that said, this is a great thing in my opinion... though I haven't tried it yet to see exactly how they implemented it.

--
Jeremy Katz

Don't bother with NT for serious work by Morgaine · 1999-11-05 04:57 · Score: 2

But a fat lot of good a kernel debugger does you on a closed-source OS.

NT had the future almost in its grasp, but let it slip away by being impossibly unreliable and horribly admin-unfriendly compared to any Unix product. [We worked with it for a year but eventually had to discard it as a worthless toy.)

But that was then. Now it's just plain obsolete. Face it.

--
"The question of whether machines can think is no more interesting than [] whether submarines can swim" - Dijkstra

Re:RTFM by jafac · 1999-11-05 04:20 · Score: 2

For me it's not the size of the binary file - I was already used to 256 - 512 Meg binary memory dumps of Netware (I guess now they have some sort of selective tool for NW 5 that makes the dumps much smaller).

What others have pointed out, and what I was saying is that, on my side of the phone, that does me NO good. Unless my customer has MS Visual Studio on it (which, by the way, usually screws up the delicate mix of MFC dlls and causes problems of it's own), this file is useless.

On Netware, you could drop into the debugger, even on a production file server, check a few pages of the stack, and the registers, and jump back, and quite often, not greatly interrupt service (if you did it within the timeout period of the Netware Clients). Not possible on NT.
You learned ONE tool, console debugger, and it was the SAME interface and commands as the tool that examined the coredump files on your DOS machine at your leisure. I quite often used to talk customers through debugger sessions on the phone to gather information. Even when the customer was totally non-technical. This is not possible for NT, because if you're lucky, and the customer HAS a debugger, it most likely isn't one you're familliar with. And if the customer is non-technical (MUCH more likely on NT than Novell), again, you're SOL.

On NT, your ONLY option is, 99% of the time, is to get the memory file transported to you (FTP or whatever), and send it to a programmer who has the time and the very expensive software to debug it. With Novell, ME, a non-programmer, a tech support guy, without a costly subscription to MSDN, without a costly copy of MS Visual Studio or SoftICE, could quickly and cheaply debug problems, compare the call stack to other incidents to see if it's a similar problem, or distill the pertinant information down to a paragraph or two and email it to a developer for debugging or suggestions; and if that wasn't enough info for the developer, I could THEN resort to getting the whole file, or trying to reproduce the problem.

The thing is, the integrated debugger solution gave support some granularity in how much resources were devoted to a problem. Now, we either have to be equipped like a developer (COSTLY!), or we have to forward MOST cases to a developer (COSTLY, and ugly!).

I know that Microsoft's reason for this was that an integral debugger compromises security (in theory, you could look up user-data with it that you wouldn't otherwise have rights to).
IMO, this is totally lame, because if the administrator was worried about security, the debugger could be disabled and locked out. And for cases where a debugger was needed, the administrator could go into the user setup, and check the box that enables the debugger.
The real reason was probably so they could increase the value of MS Visual Studio, and ask a higher price. Before NT came out, a debugger was commonly considered a necessity of life, and I can't think of one OS (other than Windows95) that a debugger didn't ship free of charge with (even dos had DEBUG.COM).

I wish I had a nickel for every time someone said "Information wants to be free".

--

These are my friends, See how they glisten. See this one shine, how he smiles in the light.

Re:SCSI only? by yakker · 1999-11-05 04:58 · Score: 2

The SCSI partition requirement is only because raw I/O only works for SCSI right now. As soon as an IDE driver works for raw I/O under linux2.X.Y, the LKCD project should be very easy to make work with IDE swap partitions. Please review http://oss.sgi.com/projects/lkcd/faq.html for more information about restrictions.

Re:... by yakker · 1999-11-05 05:00 · Score: 2

We actually ran into very few crashes, so we added code to the kernel to crash it for us with user level commands. The code for crashing the kernel is listed on the LKCD FAQ page: http://oss.sgi.com/projects/lkcd/faq.html

Re:Is this a new thing or just new to SGI? by Zooks! · 1999-11-05 05:05 · Score: 2

NetBSD can dump a core for the kernel. In fact, so can many other OS's.

The trick is writing a nice little routine that is solid enough and self-contained enough to dump the memory to disk when the kernel dies.

Of course this doesn't alway work. The exception handler code might be messed up or the disk controller might be in some bad state, but for the most part, kernel exceptions aren't so fatal that they wedge the machine.

--

"I'm too old to use Emacs." -- Rod MacDonald

Netware by jafac · 1999-11-05 02:52 · Score: 3

I think that this was one of the greatest features of Novell - the fact that if your server was barfing, you could go into the debugger, and neuter an offending process; or if the server was really in trouble, it would drop into the debugger, so you could at least figure out what went wrong, or dump the memory image and send it to someone who could.

And also, it's one of the things I really, really, really HATE about NT. No debugger comes with the OS, and there's no free, distributable one out there, so from a tech support standpoint, if your customer's server barfs, you kind of have to guess at what went wrong, or establish a pattern from multiple calls, or try to reproduce it in-house. Switching from supporting Netware products to NT products has been hell, and this is 90% of the reason. This kind of thing in Linux can only help "the cause". (and because my company is working on some fairly significant Linux products, and I may end up supporting them, this makes me more optimistic about the future.)

I wish I had a nickel for every time someone said "Information wants to be free".

--

These are my friends, See how they glisten. See this one shine, how he smiles in the light.

Re:Netware by Ed+Avis · 1999-11-07 17:31 · Score: 2

Doesn't NT come with that useless 'Dr Watson' thing that, whenever something crashes, wastes your time with an unkillable dialogue box? Surely it must do _something_ useful - what is this 'application error log' that it keeps trying to create?

Also, Netscape license a thing called Full Circle that sends information back to Netscape HQ following a Navigator crash.

--
-- Ed Avis ed@membled.com

Re:Is this a new thing or just new to SGI? by Otto · 1999-11-05 02:54 · Score: 3

I've noticed that under Solaris that I've got core files after a crash.

As I understand it, the core files (which are not just Solaris, BTW) are a memory dump when an application crashes. I believe that it wasn't possible to do this with a kernel, because the kernel is the guy who is actually writing the core file. I'm probably wrong in specific bits here.

Anyway, core files can be extremely handy for debugging and such. They're just not very easy to examine. :-)

---

--
- Give a man a fire and he's warm for a day, but set him on fire and he's warm for the rest of his life.

Re:sun suing by Guy+Harris · 1999-11-06 03:27 · Score: 2

The NetApp is a BSD machine. NetBSD if I'm not mistaking.

You are mistaken. NetApp boxes do *N*O*T* run any flavor of BSD as their OS.

The underlying kernel is one written at NetApp; it doesn't support multiple address spaces, any notion of userland, or demand paging (heck, until recently, it didn't even change the page tables; it now uses the paging hardware, but only to make virtually contiguous physically-discontiguous pages, to make allocation of large chunks of memory a bit less painful).

A significant part of the of the code did from BSD - the networking stack came from BSD (4.4-Lite, with some bits of the FreeBSD and NetBSD stacks thrown in), as did many of the commands (although those had to be chainsawed a bit to run in kernel mode in a shared address space), as did the dump and restore code (although the dump code was significantly changed to work with our WAFL file system). Various support routines also came from various BSDs as well, and the NFS server code is somewhat remotely derived from the BSD code (although it was also significantly changed to fit into our environment as well).

However, that doesn't mean NetApp boxes run anything you'd recognize as "BSD" (and, in particular, the crash-dumping code isn't BSD-derived, although the savecore command is based on the BSD command, although, again, significantly modified to run in our environment, and to extract the core dump information from the core dump areas on the disks).

(Yes, I know this first hand. I'm one of the developers there, and have been since early 1994.)

I doubt ext3 will be in Linux 2.4 by cpeterso · 1999-11-05 04:31 · Score: 2

The most recent Linux 2.3.25 kernel does not have ext3. ext3 is still way alpha. Linux 2.3 is already under feature freeze. If Linus plans to release Linux 2.4 by 2000 Q1, I doubt ext3 will be part of it.

--
cpeterso

I'm missing something... by roystgnr · 1999-11-05 04:32 · Score: 3

People here are saying that yes, even NT has the ability to dump kernel core when it BSODs, but:

What exactly are you supposed to do with a kernel core dump under a closed source OS? Throw a printout of it into a bonfire to propitiate the Windows Demons? Send it to Microsoft and wait for their rigorous QA process to leap into action and send you a fixed kernel? I can't imagine trying to debug it yourself without being able to get a backtrace and look at the problem source code. Does Microsoft even leave a symbol table of internal function names in the NT kernel? What exactly do you do with a Kernel Debugger in Solaris if you can't see anything more than what a disassembler will tell you about the kernel being debugged?

Could you pull that off with Vmware? by Greyfox · 1999-11-05 04:33 · Score: 2

I was just thinking the other day that running Linux in a VM in Linux would be handy for, among other things, somewhat more secure network services (Mail, web services, etc.) Run your server in a virtual machine and who cares if it gets cracked? Just wipe it clean and reinstall (Once you get it the way you like it, you could write the image to CD and just restore from CD every time you reboot.)

--

I'm trying to teach myself to set people on fire with my mind... Is it hot in here?

Not new, but still very useful. by Yelskwah · 1999-11-05 05:14 · Score: 2

The core files you're seeing save the segment of memory in which the program was running. They can be used in conjunction with a debugger and image with debugging information to recreate the state of the application when it crashed, enabling the programmer to glean information about which instruction caused the crash.

Dumping the kernel on a crash is not new but it is useful, in much the same way.

Under HP-UX, as far as I remember, when the kernel crashes it is dumped into the swap device starting backwards from the end of swap. One of the first actions of the boot sequence (and boy can that take a long time) is to check whether there is a kernel image written in swap. If so, it's copied out and can be sent back to the kernel team for investigation.

Of course, if your boot sequence doesn't copy out the kernel, you've got a finite time to get it out yourself before it's overwritten by the ever-advancing swap data.

-John

Kernel Debug bug bug by the+eric+conspiracy · 1999-11-05 05:17 · Score: 2

I think that this is excellent news. And more so for Linux than any other system.

It is crucially important that a community project like Linux have good debugging tools, both from the perspective of quality control, and to encourage others to get involved in the community.

Other systems that are open but don't actively encourage contributions, or worse yet are closed - well, these debuggers are usefull in the sense that it helps pin point a problem. But in many cases you don't have control of the source code, so there isn't much you can do except mail it to the developers. If they even have a place to mail it to.

Re:Memory image will help. by yakker · 1999-11-05 05:18 · Score: 4

I figured I'd include an example of a crash dump analysis report that is created in /var/log/vmdump. This is what you'll get after a kernel panic (or something similar to it). You can also run 'lcrash' on the map/vmdump files and perform interactive analysis, such as a 'dump' of memory, or a 'dis'assemble of some instructions, etc. Sorry, the spacing's not going to look exactly right ...

======================= LCRASH CORE FILE REPORT ======================= GENERATED ON: Thu Nov 4 19:15:19 1999 TIME OF CRASH: Fri Nov 5 03:12:27 1999 PANIC STRING: User created crash dump MAP: map.5 VMDUMP: vmdump.5 ================ COREFILE SUMMARY ================ The system died due to a software failure. =================== UTSNAME INFORMATION =================== sysname : Linux nodename : peak-pc.engr.sgi.com release : 2.2.13 version : #1 SMP Fri Nov 5 02:59:34 PST 1999 machine : i686 domainname : engr.sgi.com =============== LOG BUFFER DUMP =============== Linux version 2.2.13 (root@peak-pc.engr.sgi.com) (gcc version egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)) #1 SMP Fri Nov 5 02:59:34 PST 1999 mapped APIC to ffffe000 (0026f000) mapped IOAPIC to ffffd000 (00270000) Detected 348940216 Hz processor. Console: colour VGA+ 80x25 Calibrating delay loop... 348.16 BogoMIPS Memory: 95448k/98304k available (1100k kernel code, 424k reserved, 1268k data, 64k init) Checking 386/387 coupling... OK, FPU using exception 16 error reporting. Checking 'hlt' instruction... OK. POSIX conformance testing by UNIFIX per-CPU timeslice cutoff: 100.26 usecs. CPU0: Intel Pentium II (Deschutes) stepping 02 SMP motherboard not detected. Using dummy APIC emulation. PCI: PCI BIOS revision 2.10 entry at 0xfcaee PCI: Using configuration type 1 PCI: Probing PCI hardware Linux NET4.0 for Linux 2.2 Based upon Swansea University Computer Society NET3.039 NET4: Unix domain sockets 1.0 for Linux NET4.0. NET4: Linux TCP/IP 1.0 for NET4.0 IP Protocols: ICMP, UDP, TCP Starting kswapd v 1.5 Detected PS/2 Mouse Port. Serial driver version 4.27 with no serial options enabled ttyS00 at 0x03f8 (irq = 4) is a 16550A ttyS01 at 0x02f8 (irq = 3) is a 16550A pty: 256 Unix98 ptys configured PIIX4: IDE controller on PCI bus 00 dev 39 PIIX4: not 100% native mode: will probe irqs later ide0: BM-DMA at 0xffa0-0xffa7, BIOS settings: hda:DMA, hdb:pio ide1: BM-DMA at 0xffa8-0xffaf, BIOS settings: hdc:DMA, hdd:pio hda: WDC AC24300L, ATA DISK drive hdc: NEC CD-ROM DRIVE:28C, ATAPI CDROM drive ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 ide1 at 0x170-0x177,0x376 on irq 15 hda: WDC AC24300L, 4112MB w/256kB Cache, CHS=524/255/63, UDMA hdc: ATAPI 32X CD-ROM drive, 128kB Cache Uniform CDROM driver Revision: 2.56 Floppy drive(s): fd0 is 1.44M FDC 0 is a National Semiconductor PC87306 (scsi0) found at PCI 14/0 (scsi0) Narrow Channel, SCSI ID=7, 3/255 SCBs (scsi0) Warning - detected auto-termination (scsi0) Please verify driver detected settings are correct. (scsi0) If not, then please properly set the device termination (scsi0) in the Adaptec SCSI BIOS by hitting CTRL-A when prompted (scsi0) during machine bootup. (scsi0) Cables present (Int-50 YES, Ext-50 NO) (scsi0) Downloading sequencer code... 413 instructions downloaded scsi0 : Adaptec AHA274x/284x/294x (EISA/VLB/PCI-Fast SCSI) 5.1.20/3.2.4 scsi : 1 host. (scsi0:0:6:0) Synchronous at 20.0 Mbyte/sec, offset 15. Vendor: IBM Model: DDRS-34560 Rev: S97B Type: Direct-Access ANSI SCSI revision: 02 Detected scsi disk sda at scsi0, channel 0, id 6, lun 0 scsi : detected 1 SCSI disk total. SCSI device sda: hdwr sector= 512 bytes. Sectors= 8925000 [4357 MB] [4.4 GB] 3c59x.c:v0.99H 11/17/98 Donald Becker http://cesdis.gsfc.nasa.gov/linux/drivers/vortex.h tml eth0: 3Com 3c905B Cyclone 100baseTx at 0xdc00, 00:c0:4f:90:6e:54, IRQ 11 8K byte-wide RAM 5:3 Rx:Tx split, autoselect/Autonegotiate interface. MII transceiver found at address 24, status 786d. MII transceiver found at address 0, status 786d. Enabling bus-master transmits and whole-frame receives. Partition check: sda: sda1 sda2 sda3 hda: hda1 hda2 VFS: Mounted root (ext2 filesystem) readonly. Freeing unused kernel memory: 64k freed EXT2-fs warning: mounting unchecked fs, running e2fsck is recommended EXT2-fs warning: mounting unchecked fs, running e2fsck is recommended dump_open(): dump device opened: 0x803 [sd(8,3)] Adding Swap: 130748k swap-space (priority -1) Adding Swap: 130748k swap-space (priority -2) Kernel panic: User created crash dump Dumping to device 0x803 [sd(8,3)] ... Writing dump header ... Writing dump pages ... ==================== CURRENT SYSTEM TASKS ==================== ADDR UID PID PPID STATE PRI FLAGS MM NAME ================================================ ============================== c0234000 0 0 0 0 0 0 c0215320 swapper c5ffa000 0 1 0 1 20 100 c5fb4060 init c5fe8000 0 2 1 1 20 40 c0215320 kflushd c5fe6000 0 3 1 1 20 40 c0215320 kupdate c5fe4000 0 4 1 1 20 840 c0215320 kpiod c5fe2000 0 5 1 1 20 840 c0215320 kswapd c59ec000 1 248 1 1 20 140 c5fb4260 portmap c5686000 0 263 1 1 20 140 c5fb4460 ypbind c578c000 0 270 263 1 20 140 c5fb44e0 ypbind c5644000 0 324 1 1 20 140 c5fb42e0 syslogd c5602000 0 335 1 1 20 140 c5fb43e0 klogd c55c4000 0 349 1 1 20 40 c5fb4560 atd c5c3c000 0 363 1 1 20 40 c5fb41e0 crond c55a2000 0 381 1 1 20 140 c5fb45e0 inetd c5518000 0 395 1 1 20 140 c5fb4660 snmpd c5348000 0 409 1 1 20 40 c5fb46e0 named c52fe000 0 423 1 1 20 140 c5fb4760 routed c5272000 0 437 1 1 20 140 c5fb47e0 xntpd c523e000 0 451 1 1 20 140 c5fb4860 lpd c51e4000 0 469 1 1 20 140 c5fb48e0 rpc.statd c5194000 0 480 1 1 20 40 c5fb4960 rpc.rquotad c5174000 0 491 1 1 20 40 c5fb49e0 rpc.mountd c5158000 0 515 1 1 20 140 c5fb4ae0 rpc.rstatd c513e000 0 529 1 1 20 140 c5fb4a60 rpc.rusersd c511e000 99 543 1 1 20 40 c5fb4b60 rpc.rwalld c50f6000 0 557 1 1 20 140 c5fb4be0 rwhod c513c000 0 577 1 1 20 140 c5fb4360 rpc.yppasswdd c5078000 0 589 1 1 20 140 c5fb4ce0 amd c5086000 0 591 1 1 20 40 c0215320 rpciod c504a000 0 592 1 1 20 40 c0215320 lockd c4f54000 0 626 1 1 20 140 c5fb4de0 sendmail c4f22000 0 641 1 1 20 140 c5fb4d60 gpm c4e12000 0 655 1 1 20 140 c5fb4e60 httpd c4e0a000 99 658 655 1 20 140 c5fb4ee0 httpd c4e06000 99 659 655 1 20 140 c5fb4f60 httpd c4dfc000 99 660 655 1 20 140 c4dfe040 httpd c4df2000 99 661 655 1 20 140 c4dfe0c0 httpd c4de8000 99 662 655 1 20 140 c4dfe140 httpd c4dde000 99 663 655 1 20 140 c4dfe1c0 httpd c4dd4000 99 664 655 1 20 140 c4dfe240 httpd c4dcc000 99 665 655 1 20 140 c4dfe2c0 httpd c4dc0000 99 666 655 1 20 140 c4dfe340 httpd c4db6000 99 667 655 1 20 140 c4dfe3c0 httpd c4a28000 0 699 1 1 20 140 c4dfe540 smbd c499e000 0 710 1 1 20 140 c4dfe4c0 nmbd c4658000 9 767 1 1 20 40 c4dfe840 actived c48a2000 0 806 1 1 20 100 c5fb4c60 mingetty c4928000 0 807 1 1 20 100 c5fb4160 mingetty c4904000 0 808 1 1 20 100 c4dfe7c0 mingetty c498a000 0 809 1 1 20 100 c4dfe440 mingetty c4766000 0 810 1 1 20 100 c4dfe5c0 mingetty c4c6e000 0 811 1 1 20 100 c4dfe640 mingetty c479e000 0 812 1 1 20 100 c4dfe8c0 getty c4798000 0 817 381 1 20 100 c5fb40e0 in.rlogind c4976000 0 818 817 1 20 100 c4dfe740 login c45b8000 0 819 818 1 20 100 c4dfe6c0 tcsh c5204000 0 838 819 0 20 0 c4dfe940 crashdump =========================== STACK TRACE OF FAILING TASK =========================== ================================================ ================ STACK TRACE FOR TASK: 0xc5204000 (crashdump) 0 __dump_execute+153 [0xc010da21] 1 dump_execute+149 [0xc011b925] 2 panic+167 [0xc0114b6f] 3 sys_setpriority+25 [0xc0115689] 4 system_call+45 [0xc0107a61] ================================================ ================

It's intended for kernel developers by EngrBohn · 1999-11-05 02:59 · Score: 3

So if you play with a x.(2y+1).z kernel while rubbing your feet on the carpet and a lightning rod attached to an ISA slot, then this is for you. If you only use a x.(2y).z kernel with z>2, then this'll probably do nothing more than occupy disk space.
Christopher A. Bohn

--
cb
Oooh! What does this button do!?

Uhhhh, This isn't a new thing. by bifrost · 1999-11-05 03:00 · Score: 2

Just about every other OS I know of (except for NT) includes this. Having a Kernel Debugger, Kernel Core Dump, and a few other tools available over the past few *YEARS* has saved me a lot of hassle. If Linux hasn't had this till now, I'm sooooooooooo sorry. Thats really dissapointing.
*BSD, Solaris, Dynix, and bazillions of other OS'es have had this ever since they were created.

Do we really need this? by Greyfox · 1999-11-05 03:07 · Score: 2

I don't think I've EVER had my kernel crash on me...

OK, I could see as how it might help the developers... ;-)

--

I'm trying to teach myself to set people on fire with my mind... Is it hot in here?

'bad programs' by mattdm · 1999-11-05 03:12 · Score: 2

It's not very likely to be a problem with a userland program, but rather something with the kernel itself -- maybe a third-party kernel module, or something you're hacking.

--

Doesn't BSD have this already? by seebs · 1999-11-05 03:13 · Score: 2

This sounds *EXACTLY* like the way BSD kernels have, since the dawn of time, handled panics. If you have enough swap space, the kernel dumps a complete core image (in a special format) to the swap device. Then, on boot, it extracts it before enabling swap, and copies a kernel over. (Goes in /var/crash, if such a place exists.)

I've used this to debug (or have someone else debug) kernel panics on BSD/OS and NetBSD systems. It's a *very* nice feature, because, in the real world, you often have a crash that can't be encouraged to happen right when the engineer is handy.

Common feature, been available for years. I just *assumed* Linux had it.

--
My blog: http://www.seebs.net/log/ --- My iPhone/iPad app: http://www.seebs.net/seebsfrac/

Slashdot Mirror

SGI announces Linux Kernel Crash Dumps (LKCD)

36 of 206 comments (clear)