Why You Shouldn't Reboot Unix Servers

← Back to Stories (view on slashdot.org)

Why You Shouldn't Reboot Unix Servers

Posted by CmdrTaco on Monday February 21, 2011 @05:44AM from the we-measure-uptime-in-years dept.

GMGruman writes "It's a persistent myth: reboot your Unix box when something goes wrong or to clean it out. Paul Venezia explains why you should almost never reboot a Unix server, unlike say Windows."

3 of 705 comments (clear)

Min score:

Reason:

Sort:

Uh.. no by Anrego · 2011-02-21 05:48 · Score: 5, Informative

I for one believe in frequent-ish reboots.
I agree it shouldn't be relied upon as a troubleshooting step (you need to know what broke, why, and why it won't happen again). That said, if you go years without rebooting a machine... there is a good chance that if you ever do (to replace hardware for instance) it won't come back up without issue. Verifying that the system still boots correctly is imo a good idea.
Also, all that fancy high availability failover stuff... it's good to verify that it's still working as well.
The "my servers been up 3 years" e-pene days are gone folks.
Re:Persistent myth? by SCHecklerX · 2011-02-21 05:52 · Score: 4, Informative

Windoze admins who are now in charge of linux boxen. I'm now cleaning up after a bunch of them at my new job, *sigh*
- root logins everywhere
- passwords stored in the clear in ldap (WTF??)
- require https over http to devices, yet still have telnet access enabled.
- set up sudo ... to allow everyone to do everything
- iptables rulesets that allow all outbound from all systems. Allow ICMP everywhere, etc.
Re:HP-UX says... by sribe · 2011-02-21 06:12 · Score: 4, Informative

Seriously. I don't know what HP is doing, but NFS hangs/stuck processes that you can't kill -9 your way out of is just wrong.
Kind of a well-known, if very old, problem. From Use of NFS Considered Harmful:
k. Unkillable Processes
When an NFS server is unavailable, the client will typically not return an error to the process attempting to use it. Rather the client will retry the operation. At some point, it will eventually give up and return an error to the process.
In Unix there are two kinds of devices, slow and fast. The semantics of I/O operations vary depending on the type of device. For example, a read on a fast device will always fill a buffer, whereas a read on a slow device will return any data ready, even if the buffer is not filled. Disks (even floppy disks or CD-ROM's) are considered fast devices.
The Unix kernel typically does not allow fast I/O operations to be interrupted. The idea is to avoid the overhead of putting a process into a suspended state until data is available, because the data is always either available or not. For disk reads, this is not a problem, because a delay of even hundreds of milliseconds waiting for I/O to be interrupted is not often harmful to system operation.
NFS mounts, since they are intended to mimic disks, are also considered fast devices. However, in the event of a server failure, an NFS disk can take minutes to eventually return success or failure to the application. A program using data on an NFS mount, however, can remain in an uninterruptable state until a final timeout occurs.
Workaround: Don't panic when a process will not terminate from repeated kill -9 commands. If ps reports the process is in state D, there is a good chance that it is waiting on an NFS mount. Wait 10 minutes, and if the process has still not terminated, then panic.