Slashdot Mirror


Why You Shouldn't Reboot Unix Servers

GMGruman writes "It's a persistent myth: reboot your Unix box when something goes wrong or to clean it out. Paul Venezia explains why you should almost never reboot a Unix server, unlike say Windows."

10 of 705 comments (clear)

  1. Persistent myth? by 6031769 · · Score: 5, Interesting

    This is not a myth I had heard before. In fact, none of the *nix sysadmins I know would dream of rebooting the box to clear a problem except as a last resort. Where has this come from?

    --
    Burns: We're building a casino!
    McAllister: Arrr. Give me 5 minutes.
    1. Re:Persistent myth? by mini+me · · Score: 2, Interesting

      He is quite correct in his assertion that Linux and BSD are not Unix. Without experience with real Unix systems, it would be impossible for him to verify that they exhibit the same behaviour. However, Mac OS X is Unix. I find it hard to believe that someone posting on Slashdot has not at least spent some time evaluating OS X, even if they ultimately decided it was not for them.

    2. Re:Persistent myth? by element-o.p. · · Score: 4, Interesting

      As for "ALL(ALL) ALL" entries in sudoers, Ubuntu, I hate you for ruining an entire generation of linux users by aping Windows privacy escalations by abusing sudo.

      Yeah, I agree with you in principle, although to be fair, there really isn't a way that Ubuntu could know what user account you are going to set up before you actually set it up, and therefore, there isn't really a way for Ubuntu to create an appropriate sudoers entry to give admin privileges to the server admin.

      Learn to use groups, setfattr...properly...

      Okay, agreed...

      Learn to use...setuid/setgid properly...

      Ugh...setuid and setgid, IMHO, should be used as little as possible. If there's a security hole in your app, then having it setuid/setgid allows a sufficiently skilled user the ability to gain elevated privileges. I'd much prefer to use sudoers to give access to specific apps to people I trust than give any user access to an app I "trust" through setuid/setgid.

      ...leave admin commands to administrators, and you won't need sudo.

      Maybe I'm just missing something, but that sounds really stupid to me. While I'm a reasonably skilled Linux admin, I don't pretend to know everything, and maybe you can teach me something I've missed in my experience so far. If so, cool. But from my perspective, sudo is an ideal tool for granting appropriate permissions as required to trusted individuals. Sudo logs the user name and command in the log files, so if someone is abusing sudo, you know. Sudo can e-mail failures to admin staff, so if someone is habitually trying to exceed their permissions, you know. Sudo allows pretty fine-grained access to users based upon group or user name, so you can easily allocate permissions as required (well, relatively easily, anyway) -- much more fine-grained than Unix User/Group/Other permissions would allow. For example, with sudo you could allow senior admins (group: admin) and web developers (group: www-dev) read/write permissions to CGI script directories, junior admins (group: jadmin) read-only permissions and all other users (group: users) no access. Uh-oh...we've got four groups here: admins, jadmins, www-dev and users, so doing that with standard Unix permissions is going to be kind of difficult (admins could be members of the www-dev group I suppose, but I can imagine cases where group A might need permissions to a subset of files that group B owns, but shouldn't have access to another subset, which would really complicate things). Sudo is a powerful tool, and just like all the other tools you mentioned, should be used appropriately as a component of overall system security.

      find /home/* -user 0 -print

      If this returns ANY files, you've almost certainly abused sudo and run root commands in the context of a user - a serious security blunder in itself.

      Maybe. I see what you are saying, but as a counter-example, I sometimes run tcpdump from within my home directory when troubleshooting problems. tcpdump has to run as superuser, and I have a lot more faith in giving myself and other admins permission to run "sudo tcpdump" than running tcpdump setuid 0. Again, maybe I'm just missing something, but I really don't have a huge problem with tcpdump (or other admin tools) writing UID 0 data to an admin user's home directory.

      --
      MCSE? No, sir...I don't do Windows. Yes, I am an idealist. What's your point?
  2. Virtualization to the rescue by Anonymous+Showered · · Score: 4, Interesting

    I run web servers for a few dozen clients, and rebooting a remote machine was always scary. There was the possibility that something might not boot up during startup (e.g. SSHd) and I would be locked out. I would then have to travel to my data center downtown (about 30 minutes away) and troubleshoot the problem. Since I don't have 24/7 access to the DC (I don't have enough business with the DC to warrant an owned security pass...) I have to wait until they open to the general clientèle in the morning.

    With ESXi, however, I'm not that scared anymore. If something does go wrong, I have a console to the VM through vCenter client (the application that manages virtual machines on the server). It's happened once where a significant upgrade of FreeBSD 7.2 to 8.1 was problematic. Coincidentally, it was because I didn't upgrade the VMware tools (open-vmware-tools port). Nonetheless, I managed to fix the problem through vCenter.

    This is why I love virtualization in general. It's making managing servers easier for me.

  3. I read TFA by pak9rabid · · Score: 2, Interesting

    What a load of horse shit.

  4. Not better than the others by cpct0 · · Score: 2, Interesting

    Quotes from stupid people:
    You should never reboot a Mac, it's not like Windows.
    You should never reboot Unix/Lunux, it's not like Windows.

    Well, you shouldn't reboot Windows either. You reboot it when it goes sour. Our Windows servers seldom go sour, so we don't reboot them. Same for Mac or *nix.

    Problem is when it starts to cause problems. Like our /var/spool partition deciding it has better things to do than exist... or the ever so important NFS or iSCSI mount that decides to Go West, and gives us the ??? ls we all dread ... with umounting impossible, so remounting impossible, and all these stale files and stuff. You either tweak these things for hours cleaning up all processes, or you reboot.

    In fact, being a good sysadmin, all my servers are MEANT to be rebooted if something goes sour. One SVN project goes sour? check if it's not the repository itself that got problems, or if the system needs to save something to safely exist ... and if not, reboot the server. Everything magically restarts itself, does its little sanity check, and a quick look at a remote syslog to make certain everything is all right. 2 minutes lost for everyone, not 3 hours of trying to clean up mess left by some stray process somewhere or trying to kill the rogue 100 compression and rsync jobs that got started eating up all RAM, CPU and network.

    Since all our servers are single processes and are either VMs or single machines, it's a breeze to do this. iSCSI will diligently wait before the machine is back up before trying to reconnect. NFS will keep its locked files up, and will reconnect to them. No, seriously, everything simply reconnects!

    Of course, the idea is to minimize these occurences, so we learn from it, and we try to repair what could've caused this problem in the first place. And there's a place to do this in a server crash postmortem. But no need to make users wait while we try to figure out wth.

  5. This is a myth? by pclminion · · Score: 4, Interesting

    I've heard a lot of myths. I've never heard a myth stating "You need to reboot a UNIX system to fix problems." If anything I've heard the opposite myth. Who promulgates this shit?

    I do remember ONE time a UNIX system needed a reboot. We (developer team) were managing our own cluster of build machines. The head System God was out of town for two weeks. We were having problems with a build host, and tried everything. Day after day. Finally, on the last day before System God was due to return, it occurred to me that the one thing we hadn't tried was to reboot the machine. The reboot fixed the problem, whatever it was.

    I felt stupid. One, for not figuring out the problem in a way that could avoid a reboot. Two, for not recording enough information to determine root cause in a post-mortem analysis. Three, for configuring a system in such a way that a reboot might be required in order to fix a problem.

    To this day I believe that reboot was unnecessary, although at the time it was the fastest way to resolving the immediate blocking issue.

  6. I would have docked you a weeks pay... by Anonymous Coward · · Score: 1, Interesting

    ...for wasting company time on non-solutions instead of doing a reboot that took 1 minute.

  7. So what. 3650 days of uptime, who cares? by Anonymous Coward · · Score: 3, Interesting

    It makes a nice figure. Ten years. HP-UX running a few more or less referential databases. 3650 days. Was it patched properly? Did anyone *really* look after it? The only thing that can be said, is that it apparently was quite a stable machine room in terms of 10 full years of electrical & other provisions, more or less intact.

    Then it was shut down for good.

    I'd rather see regular maintenance breaks and maintenance windows (pun not entirely intended), than collect numbers in the uptime command's output. But the story is true, after I left that company not a single soul ever rebooted it. Ten years after they send me an email, with an attachment of a putty session. Ten years, :)

  8. Another Linux admin with a superiority complex. by nuckfuts · · Score: 1, Interesting

    Windoze admins...

    The very first word in your "+5 Informative" diatribe is a derogatory term blanketing all administrators of Windows systems. Anything else you have to say should now be taken as extremely biased, if not plain ignorant. I've been an administrator of Unix systems for over 20 years, and an administrator of Linux and Windows servers since their early days. Being a Windows admin does not mean that one is uniformed or technically inept, any more than being a *nix admin makes one smarter.

    - require https over http to devices, yet still have telnet access enabled.

    I'm sure I have several devices on my network with telnet enabled. Why should I bother disabling it? I don't use it, so its vulnerability to password sniffing is irrelevant.

    And what do any of your gripes have to do with whether or not Unix servers should be rebooted?