Why You Shouldn't Reboot Unix Servers

← Back to Stories (view on slashdot.org)

Why You Shouldn't Reboot Unix Servers

Posted by CmdrTaco on Monday February 21, 2011 @05:44AM from the we-measure-uptime-in-years dept.

GMGruman writes "It's a persistent myth: reboot your Unix box when something goes wrong or to clean it out. Paul Venezia explains why you should almost never reboot a Unix server, unlike say Windows."

7 of 705 comments (clear)

Min score:

Reason:

Sort:

Uh.. no by Anrego · 2011-02-21 05:48 · Score: 5, Informative

I for one believe in frequent-ish reboots.
I agree it shouldn't be relied upon as a troubleshooting step (you need to know what broke, why, and why it won't happen again). That said, if you go years without rebooting a machine... there is a good chance that if you ever do (to replace hardware for instance) it won't come back up without issue. Verifying that the system still boots correctly is imo a good idea.
Also, all that fancy high availability failover stuff... it's good to verify that it's still working as well.
The "my servers been up 3 years" e-pene days are gone folks.
Re:Persistent myth? by SCHecklerX · 2011-02-21 05:52 · Score: 4, Informative

Windoze admins who are now in charge of linux boxen. I'm now cleaning up after a bunch of them at my new job, *sigh*
- root logins everywhere
- passwords stored in the clear in ldap (WTF??)
- require https over http to devices, yet still have telnet access enabled.
- set up sudo ... to allow everyone to do everything
- iptables rulesets that allow all outbound from all systems. Allow ICMP everywhere, etc.
Re:HP-UX says... by sribe · 2011-02-21 06:12 · Score: 4, Informative

Seriously. I don't know what HP is doing, but NFS hangs/stuck processes that you can't kill -9 your way out of is just wrong.
Kind of a well-known, if very old, problem. From Use of NFS Considered Harmful:
k. Unkillable Processes
When an NFS server is unavailable, the client will typically not return an error to the process attempting to use it. Rather the client will retry the operation. At some point, it will eventually give up and return an error to the process.
In Unix there are two kinds of devices, slow and fast. The semantics of I/O operations vary depending on the type of device. For example, a read on a fast device will always fill a buffer, whereas a read on a slow device will return any data ready, even if the buffer is not filled. Disks (even floppy disks or CD-ROM's) are considered fast devices.
The Unix kernel typically does not allow fast I/O operations to be interrupted. The idea is to avoid the overhead of putting a process into a suspended state until data is available, because the data is always either available or not. For disk reads, this is not a problem, because a delay of even hundreds of milliseconds waiting for I/O to be interrupted is not often harmful to system operation.
NFS mounts, since they are intended to mimic disks, are also considered fast devices. However, in the event of a server failure, an NFS disk can take minutes to eventually return success or failure to the application. A program using data on an NFS mount, however, can remain in an uninterruptable state until a final timeout occurs.
Workaround: Don't panic when a process will not terminate from repeated kill -9 commands. If ps reports the process is in state D, there is a good chance that it is waiting on an NFS mount. Wait 10 minutes, and if the process has still not terminated, then panic.
Re:Uptime by 19thNervousBreakdown · 2011-02-21 06:35 · Score: 3, Informative

They're made of considerably smaller platters, so there's much less gyroscopic force (or whatever the fuck it's called), they spin down within minutes of being idle on most laptops, and every laptop these days comes with an accelerometer-based parking utility that stops the drive no matter what it's doing if there's too much force--they're almost certainly configured to be over-conservative from the factory, but generally it's difficult to even carefully pick a laptop up without it parking the drive.

--
<xml><I><am><so><damn>Web 2.0</damn></so></am></I></xml>
Re:Persistent myth? by Waffle+Iron · 2011-02-21 07:11 · Score: 3, Informative

And yes, either one works, but '\\' is not necessary and it's a POS pattern that too many people follow because they don't or can't read the docs.)
Here's a snippet from Microsoft's own current MSDN example on the PathMatchSpec() API call:

... void main(void) { // String path name 1. char buffer_1[ ] = "C:\\Test\\File.txt"; char *lpStr1; lpStr1 = buffer_1; ...

Gee, I wonder where these people get their path separator ideas? Maybe it's because they *did* read the docs.
Re:Persistent myth? by TheHedonismBot · 2011-02-21 08:53 · Score: 3, Informative

Maybe. I see what you are saying, but as a counter-example, I sometimes run tcpdump from within my home directory when troubleshooting problems. tcpdump has to run as superuser, and I have a lot more faith in giving myself and other admins permission to run "sudo tcpdump" than running tcpdump setuid 0. Again, maybe I'm just missing something, but I really don't have a huge problem with tcpdump (or other admin tools) writing UID 0 data to an admin user's home directory.
You don't have to be root to use tcpdump. On ubuntu, do this:
sudo aptitude install libcap2-bin sudo setcap cap_net_raw,cap_net_admin=eip `which tcpdump`
If you run: getcap `which tcpdump` and it shows: /usr/sbin/tcpdump = cap_net_admin,cap_net_raw+eip then you're good to go. Now try running tcpdump as a regular user.
Re:Another Linux admin with a superiority complex. by The+Moof · 2011-02-21 09:47 · Score: 3, Informative

Why should I bother disabling it?
Generally, good administrators tend to disable service that aren't wanted or needed in their systems. Who's to say that there's not going to be a vulnerability for the service discovered down the road (*coughSolariscough*) that would make you vulnerable?