Solaris And Linux NFS Problems
mrgrumpy asks: "I run Debian (unstable, woody) kernel 2.2.14 with everything dangerously up to date, and I also run a Solaris7 (Sparc not Intel). I've had NFS working (not with autofs, just mount and share) between the two boxes fine for a few weeks since I set them up. I recently applied a patch cluster from SunSolve to the Sun box, and lo and behold, NFS stopped working. In the patch list there were quite a few NFS fixes with the kernel patch 106541-10. I have the home directories, and a development directory from the Sun box (which serves NIS, NFS, and just about everything) mounted on the Linux box. Most of the time nothing goes wrong. But, when I run the distributed.net client on the Linux box, which needs to read and write files that are mounted across from the Sun box, it locks up and I get messages such as 'Apr 17 14:20:31 boink kernel: nfs: task 1473 can't get a request slot...' in the logs." Can anyone figure out what's going wrong here? (Read more)
"My machines are: boink (GNU/Linux 2.2.14 kernel), and splat (Solaris7 [Sparc]). If I run snoop on the Sun box I get:
root@splat$ snoop splat and boink rpc nfsall throughout the logs. I had read that locking for nfs on Linux is not great, and I am using knfsd with NFS compiled in the kernel
boink.home.cyber4.org -> splat.home.cyber4.org NFS C LOOKUP2 FH=009B_lJAQ.boink
splat.home.cyber4.org -> boink.home.cyber4.org NFS R LOOKUP2 OK FH=9668
boink.home.cyber4.org -> splat.home.cyber4.org NFS C LOOKUP2 FH=009B root.lock
splat.home.cyber4.org -> boink.home.cyber4.org NFS R LOOKUP2 OK FH=9E2D
boink.home.cyber4.org -> splat.home.cyber4.org NFS C REMOVE2 FH=009B_lJAQ.boink
splat.home.cyber4.org -> boink.home.cyber4.org NFS R REMOVE2 OK
boink.home.cyber4.org -> splat.home.cyber4.org NFS C LOOKUP2 FH=009B root.lock
splat.home.cyber4.org -> boink.home.cyber4.org NFS R LOOKUP2 OK FH=9E2D
boink.home.cyber4.org -> splat.home.cyber4.org NFS C WRITE2 FH=79D9 at0 for 4096 (retransmit)
boink.home.cyber4.org -> splat.home.cyber4.org NFS C CREATE2 FH=009B_yGAQ.boink
splat.home.cyber4.org -> boink.home.cyber4.org NFS R CREATE2 OK FH=AA9B
boink.home.cyber4.org -> splat.home.cyber4.org NFS C WRITE2 FH=AA9B at0 for 1
Usually the errors cascade down a spiral of death until I reboot the Linux box. in.lockd on Linux doesn't have a debug mode (Solaris does, restart it with -d3) and I don't seem to be able to find any other way of debugging it."
So at this point, grumpy either needs a solution method or a way to debug in.lockd. Are there any methods that may prove useful in attempting to recover from an NFS "spiral of death"?
0 of 3 comments (clear)
No comments match the current filter.