Slashdot Mirror


Why You Shouldn't Reboot Unix Servers

GMGruman writes "It's a persistent myth: reboot your Unix box when something goes wrong or to clean it out. Paul Venezia explains why you should almost never reboot a Unix server, unlike say Windows."

705 comments

  1. Uptime by cdoggyd · · Score: 5, Funny

    Because you won't be able to brag about your uptime numbers.

    1. Re:Uptime by Anrego · · Score: 5, Funny

      I once had to move my router (486 running slackware and with a multi-year uptime) across the room it was in. It was connected to a UPS, however the cable going from the UPS to the computer was wrapped through the leg of the table it was sitting on.

      I actually _removed the table leg_ so I could hawl the 486 still plugged into the UPS across the room and quickly plug it in before it powered down!

      and then we had the first real substantial power failure in years like a few months later.. and the thing had to go down :(

      But yeah.. now I reboot frequently to verify that everything still comes up properly.

    2. Re:Uptime by idontgno · · Score: 2

      and then we had the first real substantial power failure in years like a few months later.. and the thing had to go down :(

      Perhaps caused by minor hard drive damage caused by relocating the system while under power?

      A rotary-media hard drive is fairly robust, if static. If spinning, it's more fragile than a Slashdotter's ego.

      I mean, it's your server, and it's an ancient 486 and all, so respect the hardware to the limit and extent you want to, but for me, if it's mine and uses hard drives, it doesn't move 2 inches or tip 5 degrees while it's powered.

      --
      Welcome to the Panopticon. Used to be a prison, now it's your home.
    3. Re:Uptime by kju · · Score: 1

      I once suffered from this illness myself. Thankfully I was able to overcome it.

    4. Re:Uptime by 0100010001010011 · · Score: 1

      In case you forget what to do WHEN you do reboot.

      http://thedailywtf.com/Articles/Designed-For-Reliability.aspx

    5. Re:Uptime by Anrego · · Score: 4, Funny

      I meant mains power.. due to a hurricane actually (hurricane Juan).

      The machine came out fine (and actually still runs.. though I don't use it as a router any more). Those old drives are surprisingly robust ..

      But yeah.. I was actually surprised.. and I did it more for the sake of the doing (the only reason I even left the machine going was because of the uptime). I'd never pull a stunt like that with a real machine :D

    6. Re:Uptime by Captain+Centropyge · · Score: 1

      Most notebook computers use spinning drives, and no one whines about moving those around while they're powered up. Just saying...

      --
      Bite my shiny metal ass!
    7. Re:Uptime by ehrichweiss · · Score: 1

      "Perhaps caused by minor hard drive damage caused by relocating the system while under power?"

      He clearly said "first real substantial POWER failure"(emphasis mine).....as in the power failed for longer than the UPS batteries could hold out for.

      --
      0x09F911029D74E35BD84156C5635688C0
    8. Re:Uptime by kaiser423 · · Score: 2

      But yeah.. now I reboot frequently to verify that everything still comes up properly.

      Yea, I do that too. Too many time I've been out of the house, and a power failure happens and not all of the boxes boot up correctly, and then I can't access my stuff, or the wife complains about the picture gallery being down, etc, etc. To me, it's a good admin practice. If you aren't 100% sure that your servers boot up properly, how exactly are you prepared for a failure? Take it offline off hours, give it a reboot and make sure that it comes up, services start, and it re-joins its place in the network properly. Should be standard machine admin practices....

    9. Re:Uptime by keitosama · · Score: 1

      Perhaps caused by minor hard drive damage caused by relocating the system while under power?

      I would assume he's talking about a power failure leaving him without electricity, not a hard drive getting busted up.

    10. Re:Uptime by maxume · · Score: 1

      The manual for mine says not to move it around while powered up (Lenovo).

      --
      Nerd rage is the funniest rage.
    11. Re:Uptime by Anonymous Coward · · Score: 0

      It costs thousands of dollars to remove a table leg? I'm sure it was replaced...

    12. Re:Uptime by similar_name · · Score: 1

      I once had a computer with a hard drive that had a cracked case. It was a tower and stopped booting up. I found that if I laid it on it's side, turned it on, and then stood it upright after it started booting it worked fine. My assumption was that upright, everything in the HDD didn't line up right. On it's side it did. I imagine that the gyroscopic nature of the platters in the hard drives kept them in place while I stood it upright again.


      My ex-wife once threw an something towards me striking my computer. My computer had no case, just a motherboard sitting on a desk with the power supply and drives next to it. Cables everywhere. It was running and the motherboard went off the table and was hanging by cables. I turned it off and back on, nothing. Reseated the RAM and presto I'm still using it 4 years later. Did I mention every piece of that computer was found next to dumpsters? I believe the CPU and MB were the only two components that came out of the same machine.

      Anyway, my point is computers are much sturdier than people think. In my completely objective all encompassing experience :) I've had more computers just stop working one day for no apparent reason than because of the brutal trauma I tend to give them.

    13. Re:Uptime by GuruBuckaroo · · Score: 1

      Wait, so the power failure was caused by damage to the hard drive? Did you actually read the comment?

      --
      Poor means hoping the toothache goes away.
    14. Re:Uptime by Wyatt+Earp · · Score: 1

      I do laptop support and I whine about moving them when powered up.

      Seriously, in my house a laptop gets put to sleep before you move it around too much.

      If you hand someone else the laptop, stand up, walk with it, set it down, close the lid for sleep or you get a stern look and a head shake from me.

    15. Re:Uptime by 19thNervousBreakdown · · Score: 3, Informative

      They're made of considerably smaller platters, so there's much less gyroscopic force (or whatever the fuck it's called), they spin down within minutes of being idle on most laptops, and every laptop these days comes with an accelerometer-based parking utility that stops the drive no matter what it's doing if there's too much force--they're almost certainly configured to be over-conservative from the factory, but generally it's difficult to even carefully pick a laptop up without it parking the drive.

      --
      <xml><I><am><so><damn>Web 2.0</damn></so></am></I></xml>
    16. Re:Uptime by Captain+Centropyge · · Score: 1

      That's all fine and good, but it's not like someone sitting with a laptop ON THEIR LAP isn't going to shift around or move at all. If it's on a table, fine. But I guarantee that laptops are moved all the time without dire consequences. I've had a laptop for a few years that I've moved while on quite often and it's still running just fine.

      --
      Bite my shiny metal ass!
    17. Re:Uptime by Captain+Centropyge · · Score: 1

      That may be true in your house. But I doubt 95% of the population listens to your advice. I'm not saying it's not valid advice. But most laptops don't break just because you moved them while they were on. But if you drop the laptop while it's on, you'll probably have big trouble on your hands.

      --
      Bite my shiny metal ass!
    18. Re:Uptime by Captain+Centropyge · · Score: 1

      Good point! I didn't think of that.

      --
      Bite my shiny metal ass!
    19. Re:Uptime by monkyyy · · Score: 0

      1. init 6
      2.???????
      3. PROFIT

      --
      warning pointless sig
    20. Re:Uptime by Anonymous Coward · · Score: 0

      When you're in the Windows world, all simple activities (like removing a table leg) often cost thousands of dollars. You have to forgive them their lack of proper perspective.

    21. Re:Uptime by element-o.p. · · Score: 1

      I'm sure your employer would be thrilled at the money you're wasting just to make a number higher.

      I suspect you aren't completely serious (there's a subtle whiff of tongue-in-cheek humour in your post), but on the off chance that you really were being serious, I didn't get the impression that the original poster was talking about using his router in a professional environment. I rather suspect that if I was trying to use an old 486 running Slackware as a corporate router, my boss would be far less concerned that I had removed a chair leg to preserve uptime on the aforementioned router than the fact that I was using obsolete and unreliable hardware as part of the network infrastructure.

      --
      MCSE? No, sir...I don't do Windows. Yes, I am an idealist. What's your point?
    22. Re:Uptime by Khyber · · Score: 1

      Nope! Dropped my HP bunches of times. The hard drive protection mechanism works damned well, I must admit.

      --
      Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
    23. Re:Uptime by Wyatt+Earp · · Score: 1

      I deal with educators who travel in Alaska, if one of their laptops has a problem, 9 out of 10 times its a total hard disk failure.

      Personally, I take care of my laptops, some people don't. Now an SSD device, I don't worry about moving parts failing, same with flash memory devices like my iPhone or iPad.

    24. Re:Uptime by mallyone · · Score: 3, Funny

      I bet any slashdotter that he still has a 3 legged table! :).

    25. Re:Uptime by Gilmoure · · Score: 1

      Had a dog knock over table my iMac was on. glass covering screen shattered and drive made this neat clicking thing but once new drive was installed, everything works ok. Will be glad when we finally transition from these Victorian era storage devices, with spinning bits and gears and such, to nice solid state pieces.

      --
      I drank what? -- Socrates
    26. Re:Uptime by amiga3D · · Score: 1

      You probably wouldn't believe some of the junk we're using then. It barely ran XP Pro and then they upgraded it to vista, right after Win 7 became available. So now we're on vista on crappy hardware that struggled on xp. Half the workstations stay screwed on a rotational basis as the IT contractors come by and try to keep them going. I know of 6 people in my shop that bought Macs over the last year just because of their experience with Vista at work.

    27. Re:Uptime by Scarletdown · · Score: 1

      I rather suspect that if I was trying to use an old 486 running Slackware as a corporate router, my boss would be far less concerned that I had removed a chair leg to preserve uptime on the aforementioned router than the fact that I was using obsolete and unreliable hardware as part of the network infrastructure.

      But even if it was in a professional environment instead of for personal use, the fact that the router was:

      1 - working as intended
      2 - running for several years without needing a reboot

      then apparently:

      1 - the hardware was actually not obsolete for its purpose
      2 - the hardware was obviously not so unreliable after all

      Besides, unless your boss is a tech head and not a typical PHB, would you actually keep him informed about how everything really works?

      --
      This space unintentionally left blank.
    28. Re:Uptime by maxume · · Score: 1

      Yeah, I got it, I move mine around/use it as a laptop too, I was just pointing out that at least 1 manufacturer does whine about it.

      --
      Nerd rage is the funniest rage.
    29. Re:Uptime by Anonymous Coward · · Score: 1

      No, I worked there when this was done. The table leg was union. We had to call The International Service Union of Tables, Electricians and Goldman-Sachs. We don't know how G-S got on there. It cost way more than thousands. We had to declare bankruptcy. Oddly enough, the cost of selling everything, from business all the way down to left-over toilet paper, matched the bonus of G-S. Weird how that works.

    30. Re:Uptime by jmorris42 · · Score: 1

      > Take it offline off hours, give it a reboot and make sure that it comes up,
      > services start, and it re-joins its place in the network properly.

      That isn't enough. You have to simulate an entire network restart to simulate a long power failure. Discover how your UPSs behave and simulate a full unattended restart. Amazing how many small glitches you can find where a key machine comes up too late to provide DHCP or DNS and that cascades to the NFS server not being available, etc. Server grade hardware can take a long time to start, long enough for less robust clients to time out waiting on it.

      I finally scripted a bootwait service to drop on key machines to pause their start sequence until other key machines respond.

      --
      Democrat delenda est
    31. Re:Uptime by chimpo13 · · Score: 1

      UPS is your friend, not your mortal enemy. That's universal power supply, not the delivery service. The delivery service IS your mortal enemy.

    32. Re:Uptime by Anonymous Coward · · Score: 0

      Sadly I know folks like that. I always wondered about their sanity when it came to maintaining security and kernel patches.

    33. Re:Uptime by yanyan · · Score: 1

      Sorry, but that's uninterruptible power supply.

    34. Re:Uptime by DamnStupidElf · · Score: 1

      I think you'd be surprised to find out how much money management is willing to throw at making arbitrary numbers bigger. One word: Metrics.

      It doesn't matter what they measure, how valid or applicable they are, or how marginal the returns, but god help you if management starts latching onto them.

    35. Re:Uptime by crashumbc · · Score: 1

      and the failure rate on laptop hard drive is much higher than desktops...

    36. Re:Uptime by Twanfox · · Score: 1

      At least HP systems these days detect significant acceleration and immediately park the heads of the drive to prevent the possibility of damage. Now, that's not to say it's a great idea to move it while powered, but there are some safety checks in place to prevent damage to the best of their ability. Bothersome, too, when you're shutting down the laptop and pop it from its dock, only to have to wait 30 seconds for the system to unpark the heads and complete the shutdown after you whisked it away to pack it in your bag.

    37. Re:Uptime by russotto · · Score: 1

      Most notebook computers use spinning drives, and no one whines about moving those around while they're powered up. Just saying...

      It's unavoidable (unless you go to SSD) but it's still a problem, hence the IBM Active Protection System and the Apple Sudden Motion Sensor and a few others.

    38. Re:Uptime by Anonymous Coward · · Score: 0

      Those old drives are surprisingly robust ..

      Or are modern drives unsurprisingly flakey?

      Programmed obsolescence ftf...

    39. Re:Uptime by MattBD · · Score: 1

      Whenever my sister borrowed a laptop from me, she'd always bring it back, open and running. Every time she did it I wanted to scream "YOU F***ING IDIOT!" at her. Eventually I bought a new netbook and sold her my old Eee PC for £50, partly so I didn't have to deal with her doing that. I still cringe when I see her pick it up by the screen.

    40. Re:Uptime by chimpo13 · · Score: 1

      Yikes, yes. My mistake. It's hot in Saudi and I'm distracted from watching news of Libya and Bahrain. I'm near Yemen and I'm keeping an eye on that, too. If Yemen goes, I think the school staff will be evacuated.

    41. Re:Uptime by Anrego · · Score: 1

      Correct, this was at home not work (at work I'm a programmer.. being good sysadmins they wouldn't let me anywhere near the servers!)

      But I have to say, I've seen some pretty damn dicey stuff in production systems. Wouldn't surprise me one bit if some multi-million dollar operation was running through some old 486 cause they used it for a prototype and just left it in place.

    42. Re:Uptime by Pseudonym+Authority · · Score: 1

      He obviously works in a power plant, and that machine controls everything.

    43. Re:Uptime by mlts · · Score: 1

      I'm sure there are some. There are still some Novell 3.11 and 3.12 boxes still out there, so finding a 486 running Jolitz's 386BSD as a gateway might not be out of the question.

      I have seen dicey stuff in production systems as well. The problem is when people confuse "it works" with "it works at a production level". For example, Exchange running on a cast-off $300 machine bought from Wal-Mart might be suitable for a home network, but if it is needed for business production use, one needs a lot more hardware.

    44. Re:Uptime by sortius_nod · · Score: 1

      You obviously don't work support. The amount of replacement HDDs that I've had to hand out due to users not understanding that a laptop can't be thrown on the passenger seat of a car while it's on and open... Unless the heads are parked, moving a laptop while on is a stupid idea.

    45. Re:Uptime by David+Gerard · · Score: 2

      There's a reason my MP3 player and netbook both use flash. The MP3 skidded across the pavement as I was getting home this evening, and I have a 3yo daughter who delights in knocking things over.

      --
      http://rocknerd.co.uk
    46. Re:Uptime by mlts · · Score: 1

      So true. For example, if the app servers pop up before the database servers or the network fabric, it might be that the apps will need restarting manually, or at worst, the apps will eat themselves, requiring a restore.

      This is why I like having machines stay off it a power failure happens. This way, I can get the network fabric up, then the main services (DHCP, DNS, ACE, AD), then the databases up (and their containers checked for errors). Then the application servers (WebSphere, custom stuff, SAP). Finally the external Web servers.

    47. Re:Uptime by RESPAWN · · Score: 1

      Funny story. At a previous employer, one of the UPS nodes (I can't seem to recall the technical term here) was so close to capacity that if you attempted to bring up all devices on that node, you'd end up popping the breaker on that node, and would have to reset the breaker in the UPS. Long story short, because of that we couldn't rely on all services to return after an outage, so somebody would have to stick around to bring up all devices in the proper order, lest we lose 1/3 of our servers due to a popped breaker.

      --

      If Murphy's Law can go wrong, it will.

    48. Re:Uptime by JamesTRexx · · Score: 1

      That'll have to be it then, except I move my laptops quite a lot, from room to room, also I change position on chair, couch, etc constantly while the disks are read/written to so I don't think any of them really park the heads at that time.
      And so far no problem with defective drives, apart from a few in statoinary pc's.

      --
      home
    49. Re:Uptime by Captain+Centropyge · · Score: 1

      No, I don't work support. However, most users of notebooks don't listen to your advice about moving laptops while they're on. I move mine all the time, although I'm careful with it when I move it. But to think that users won't move the laptop while it's turned on isn't logical. After all, most people see that laptops and netbooks are portable, and think they can just take them anywhere at anytime. Yes, it may reduce the drive lifespan. But since most people see technology as something replaced every year or two, that's not a huge issue for most people. As I stated in another comment, I've had laptops with drives that have lasted several years, despite the fact that I move mine around while it's on all the time. Apparently your users are a bit less careful with laptops than I am. And it's not like this is some kind of large message that anyone really sees on the packaging for their laptop, or a warning screen, or even through word of mouth. (I'm sure the hardware manufacturers are fine with that, since it means they'll sell more hardware when it breaks.) It's sound advice. It's just that people don't even think about it, due to the nature of the device.

      --
      Bite my shiny metal ass!
    50. Re:Uptime by Anonymous Coward · · Score: 0

      I used to have an old 486 based router/firewall setup built from a random assortment of bits and pieces and sat in a cupboard under the stairs next to our central heating unit.

      That unit was incredibly reliable, even with the heater blasting just a few feet away!

      It ran basically non-stop for about 6 years, max uptime 2 and a bit years.

      Only major hardware failure in all that time was a when a spider decided that the network card would be a nice warm spot for a home and caused the old NE2K to overheat.

      Eventually we upgraded from our old ADSL modem to a new ADSL2+ one with everything built-in and the router was no longer needed, but still, sometimes I miss the reliability of that old box.

    51. Re:Uptime by Anrego · · Score: 1

      Good point.

      These days if you buy >4 consumer grade hard drives... you are almost guarenteed _one_ of them will be bad right out of the box. Yes, I know this is why you can buy the same spec drive for twice the cost with an "enterprise edition" tacked onto it... and I know this is the cost of the awesome $$ to GB value we have these days... but still!

    52. Re:Uptime by Anonymous Coward · · Score: 1

      In my house if anyone gave a stern look or a head shake, they'd be kicked out for being arrogant fucks.

    53. Re:Uptime by Anonymous Coward · · Score: 0

      A couple years ago I got in a new batch of 1u rack servers, delivered by UPS. One of the cardboard boxes had a TIRE TRACK on it. Fortunately, the box was 75% polyfoam and 25% server so the hardware didn't get damaged, but now I make sure I take a photo of everything they deliver before I open it.

    54. Re:Uptime by sumdumass · · Score: 1

      It's probably more of an issue of a demand for costs to go down along with increasing complexity with stuffing more and more storage into the same space.

      It used to be that all hard drives had a 3-5 year warranty on them. Then for some reason (known flaky supply parts?) it dropped to only 1 year. Now it's back up to 3 and 5 with the majority I'm seeing being 5 years. ( I wasn't thinking of OEM drives either.)

      Perhaps the new modern drive are robust again, we just haven't seen enough time pass to recognize it yet.

    55. Re:Uptime by SnarfQuest · · Score: 1

      and then we had the first real substantial power failure in years like a few months later.. and the thing had to go down :(

      Perhaps caused by minor hard drive damage caused by relocating the system while under power?

      Maybe I'm just totally uninformed, but how does a flaky hard drive cause a power failure? Unless parts exit the hard drive case, or the drive shorts out in a blast of smoke, I don't see any power failure mode that wouldn't also have some mention of razer sharp shards or billowing clouds of smoke.

      Also, I misread the last 'l' in your trailing quote as a 't' at first read, which substantially changed it's meaning.

      --
      Who would win this election: Andrew Weiner vs Andrew Weiner's weiner.
    56. Re:Uptime by sumdumass · · Score: 1

      Bought macs for work? Or for their home systems?

      This is a true story.. Had 2 and a half year old computers at a site with windows XP installed on them once. We needed to replace the cases on a couple of them because of some messed up desk they purchased which they didn't fit into. After the cases were changed out, the employees that sat at the computers started commenting on how much faster they were and how well they liked them. Before long, everyone was demanding new computers too. Instead of getting new computers, the owner decided to get new cases for each computer.

      Oh yea, in case you were wondering, the desk issue was where a ventilation channel ran through the center of the base and the tower just spanned the vent. Vibrations or whatever would cause them to move slightly to one side and they would fall into the vent chamber and knock some of the wires off in the rear of the system while dumping anything stored on top of it. Turned out the vent was supposed to have a cover but the owner purchased them second hand from another business that closed down and they didn't come with them. It was cheaper to replace the cases then the vent covers.

    57. Re:Uptime by Profane+MuthaFucka · · Score: 1

      My ego is NOT fragile. It's large and robust. I run my ego on Linux.

      --
      Fascism trolls keeping me up every night. When I starts a preachin', he HITS ME WITH HIS REICH!
    58. Re:Uptime by sumdumass · · Score: 1

      I have two of them.. Well, actually, one of a 3.12 and the other is a 4. something.

      They don't do much beside sit on my network as a novelty. They became unneeded when a software package we used stopped supporting them and the company decided it would be cheaper to go with another vendor altogether then to upgrade the Novell machines. I had to keep them online for another couple years for backwards data access (they transferred much of it by hand) but that was years ago.

    59. Re:Uptime by Paracelcus · · Score: 1

      You mean, like the laptops in police cars? The ones mounted where the officer can see it while driving? Like on the TV show Cops, bouncing across open fields & RR tracks?

      --
      I killed da wabbit -Elmer Fudd
    60. Re:Uptime by grcumb · · Score: 1

      That isn't enough. You have to simulate an entire network restart to simulate a long power failure. Discover how your UPSs behave and simulate a full unattended restart. Amazing how many small glitches you can find where a key machine comes up too late to provide DHCP or DNS and that cascades to the NFS server not being available, etc.

      I just had to do this today. We had a category 4 cyclone warning over the weekend (I live in the Southern Hemisphere) and I had to take everything offline and secure the hardware in protected storage. The order in which we returned the various machines to service is part of our protocol, because experience has taught us exactly this lesson.

      Unsurprisingly, one of our DNS servers didn't come back properly, meaning that a bunch of related services were delayed.

      --
      Crumb's Corollary: Never bring a knife to a bun fight.
    61. Re:Uptime by Anonymous Coward · · Score: 0

      Wow, do you keep your boxes in a bubble too? Like John Travolta and Xenu? http://www.archive.org/details/The_Boy_In_The_Plastic_Bubble

    62. Re:Uptime by Bing+Tsher+E · · Score: 1

      Just like any any other business, there is a team of 'continuation engineers' involved in the product life of any piece of technology. If a piece of equipment seems to be lasting much longer than the warranty period, it's an opportunity for cost reduction.

    63. Re:Uptime by element-o.p. · · Score: 1

      My boss is indeed a tech head, and not a typical PHB. So, yeah, I'd probably keep him informed because if I don't he'll figure it out on his own.

      --
      MCSE? No, sir...I don't do Windows. Yes, I am an idealist. What's your point?
    64. Re:Uptime by antdude · · Score: 1

      My longest uptime for any desktop PCs was my RedHat Linux 7.x that had about 227 days of uptime. It was outdated and never got updaters.

      Earlier today, I had to reboot my old Debian box with its 28 days of uptime because it didn't like my old Omni Cube KVM (PS2 and VGA) swap/switch (old one had problems). Keyboard was working, but mouse cursor was frozen in both console and X (X wasn't responding either). Windows XP Pro. SP3 didn't complain! Has anyone had this problem before? :(

      --
      Ant(Dude) @ Quality Foraged Links (AQFL.net) & The Ant Farm (antfarm.ma.cx / antfarm.home.dhs.org).
    65. Re:Uptime by phoenix321 · · Score: 1

      Use an SSD and/or buy a sturdy laptop, with automatic HDD head parking if magnetic platters are your thing.

      Toughbook or Thinkpad T/X-Series. You can throw these around (on a carpeted floor) and they don't mind. You get used to picking it them by the screen after a while, trust me :)

    66. Re:Uptime by phoenix321 · · Score: 1

      A good laptop should either
      - park the heads whenever it senses weightlessness (as in "falling", "floating", "being thrown")
      - not have a storage medium where delicate metallic parts are floating less than a hair's thickness above an absurdly delicate metal-coated disc at slightly subsonic speeds.

    67. Re:Uptime by Fulcrum+of+Evil · · Score: 1

      My solution would be to simplify - NFS dependencies are something to be avoided; my pref is that no server reexports NFS shares, and that NFS is isolated and granular - leaving a network to agglomerate over 10 years leads to situations where you have circular dependencies and no good way to power everything up. The way I see it, disk is cheap, so the only reason to have NFS is for backups (not a persistent or boot time thing) and data deduplication. YMMV of course, else we'd all be out of a job.

      --
      "We returned the General to El Salvador, or maybe Guatemala, it's difficult to tell from 10,000 feet"
    68. Re:Uptime by twebb72 · · Score: 1

      You did the ol' George Costanza Frogger Move Across The Street.

    69. Re:Uptime by Fulcrum+of+Evil · · Score: 1

      The technical term is UPS, I think. Anyway, one of the cool things that power tech does is staggered boot - this was important when disk drives were the size of a washing machine, but still useful. I wonder if a fancy powerstrip could do that. At the moment, I'm running into the same problem with load - my old 350VA UPS can't hack two machines (one with GPU) and two monitors if the GPU decides to start doing a lot of stuff - Starcraft cutscenes suck when you get a hideous beeping over the dialog, but the actual game was ok. It does make me feel good though - if the combination of all the above kit just barely overstresses the UPS, then my power load during normal use should be fairly modest.

      --
      "We returned the General to El Salvador, or maybe Guatemala, it's difficult to tell from 10,000 feet"
    70. Re:Uptime by Billlagr · · Score: 1

      It would, because to remove a table leg and put it back again, you'd need MS Carpentry Professional. You could get away with Standard, if you were only taking it off, putting it back is a Professional feature.

    71. Re:Uptime by Anonymous Coward · · Score: 0

      If that were the case, laptops wouldn't last 10 minutes.
      Get a clue.

    72. Re:Uptime by jmorris42 · · Score: 1

      > The way I see it, disk is cheap...

      And unreliable. And doesn't follow the user from workstation to workstation. It isn't a matter of cost though. Network mounted home directories are about moving to the closest machine, logging in and getting to work. And when a machine has a hardware problem being able to quickly drop a spare into the spot and troubleshoot at leisure.

      --
      Democrat delenda est
    73. Re:Uptime by Sulphur · · Score: 1

      I once had a computer with a hard drive that had a cracked case. It was a tower and stopped booting up. I found that if I laid it on it's side, turned it on, and then stood it upright after it started booting it worked fine. My assumption was that upright, everything in the HDD didn't line up right. On it's side it did. I imagine that the gyroscopic nature of the platters in the hard drives kept them in place while I stood it upright again.

      Gyrostabilized Processing System (GPS) ...

    74. Re:Uptime by mlts · · Score: 1

      One company I knew that still had a 3.12 machine had it for the express reason that they believed it was far more secure than any currently made server operating system.

      That, plus the fact that the antediluvian versions of Netware would hide any files and directories a user would have access to, rather than showing and giving an "access denied" error.

      How they got Netware to work with modern drive capacities, I will never know.

    75. Re:Uptime by Sanat · · Score: 1

      In the end all I got was a table with three legs!

      --
      And in the end, the love you take is equal to the love you make
    76. Re:Uptime by cachimaster · · Score: 1

      If you don't reboot that means you didn't patched the kernel (Seriously, how many of you uses ksplice?) If you didn't patch the kernel you have security vulnerabilities. If you have security vulnerabilities, you can be pwnd.

      Isn't scary that I can know remotely your uptime and then I know exactly which exploit use against you? Go ahead, brag about your uptime all that you want, but it is not a good thing.

    77. Re:Uptime by Fulcrum+of+Evil · · Score: 1

      That fits with my strategy - I'm mostly avoiding crosslinked NFS shares in serverland. nfs homedirs work pretty well in my experience.

      --
      "We returned the General to El Salvador, or maybe Guatemala, it's difficult to tell from 10,000 feet"
    78. Re:Uptime by tinkerghost · · Score: 1

      I ditched my KVM switch because of this. Mine was USB rather than PS2, but when you switched out to the other machine, it took the USB connection off line --- the 3rd or 4th time it happened it would lock up the USB driver for the hub and you had to move the plug to another port to get it to work again.

    79. Re:Uptime by antdude · · Score: 1

      Wow, how annoying. I was going to get KVM with DVI and USB, but geez so expensive for the last few years and these days. Frak that! I noticed my newest Intel i970 motherboard only has PS2 port for keyboard, but not the mouse (USB only). Ugh.

      KVMs are cool, but not reliable. :(

      --
      Ant(Dude) @ Quality Foraged Links (AQFL.net) & The Ant Farm (antfarm.ma.cx / antfarm.home.dhs.org).
    80. Re:Uptime by __aaxtnf2500 · · Score: 1

      Centripetal force. For a constant linear velocity, the force exerted on each point of the platter is inversely proportional to the radius, or distance from the axis of rotation. Across the platter, the centripetal force increases linearly as the radius is increased. Stress in the platter does not follow as simple a relationship.

    81. Re:Uptime by rtfa-troll · · Score: 1

      Damn right. All servers should be rebooted at least once a month, probably more. The redundant server will take over seamlessly (if you don't have a redundant server then the service isn't serious and you can do this during daytime). If you can't afford to run without redundancy then you should have at least two redundant servers. However, the article seems to be talking about rebooting to debug.

      --
      =~ s,(.*),<sarcasm>$1</sarcasm>,g if any_point_you_wish();
    82. Re:Uptime by Anonymous Coward · · Score: 0

      That reminds me of a time I first moved out of the parents house to my own apartment, and just for shits n grins I wanted to keep one of my servers running not only for 3+ years but between two homes across town.

      I carefully put it in single user mode, parked the drive heads, had the inverter all ready in the moving van on a set of car batteries just for the purpose.

      Loaded the 3' tall tower and its UPS on a dolly and up in the moving van, raced to the other side of town to my new place and got the thing back in an empty room already designated for the computer gear.

      Got it back on the mains, and back in multiuser mode. All the disks survived the trip! The next day it was even back on the network.

      I only had bragging rights for about two weeks, as one day while I was at work and the girlfriend was at home showering when the power went out. Since that room was the only one in the house that seemed to still have sounds and lights coming out of it, she figured she would finish her shower by plugging a hair dryer into the UPS.....

      To this day I can still imagine the sound of the short UPS squeal and sudden plunging into darkness after all the poor computer has been through.

    83. Re:Uptime by MichaelSmith · · Score: 1

      Do you think Saudi will go the same way?

    84. Re:Uptime by treeves · · Score: 1

      But that same manufacturer makes notebooks with accelerometers to keep the HD head from crashing when it is accelerating too much, to protect the HD, and touts that capability. They may make some without that feature I guess. Maybe that's why they complain: to get you to buy the more expensive machine.

      --
      ...the future crusty old bastards are already drinking the Kool-Aid.
    85. Re:Uptime by TheMidget · · Score: 1

      That's all fine and good, but it's not like someone sitting with a laptop ON THEIR LAP isn't going to shift around or move at all. If it's on a table, fine. But I guarantee that laptops are moved all the time without dire consequences.

      However, keeping it ON YOUR LAP can lead to dire consequences. Especially if you are not moving it enough...

    86. Re:Uptime by TheMidget · · Score: 1

      The amount of replacement HDDs that I've had to hand out due to users not understanding that a laptop can't be thrown on the passenger seat of a car while it's on and open...

      There's quite a difference between carefully moving a laptop, and throwing it...

    87. Re:Uptime by AmiMoJo · · Score: 1

      They are not that sensitive, otherwise you would never get any work done on the train or in a car (a least on UK rail/roads). The forces that keep the head floating above the disc are actually quite strong and will easily resist a fair amount of acceleration. The biggest effect it has is on the head's ability to seek to the right part of the disc, i.e. lateral movement.

      Certainly the amount of acceleration from picking a laptop up is not enough to park the heads.

      --
      const int one = 65536; (Silvermoon, Texture.cs)
      SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
    88. Re:Uptime by tinkerghost · · Score: 1

      Cheap KVM's aren't reliable. I've used HP & Dell Rack KVM systems and never had a problem. The cheap $25 2 port KVM systems generally seem to cause more problems than they solve.

    89. Re:Uptime by antdude · · Score: 1

      How much are those rack ones? Those USB and DVI ones are expensive even for consumer ones! :(

      --
      Ant(Dude) @ Quality Foraged Links (AQFL.net) & The Ant Farm (antfarm.ma.cx / antfarm.home.dhs.org).
    90. Re:Uptime by Anonymous Coward · · Score: 0

      "a substantial power failure" is mentioned and you think a possibly damage HDD is a more likely cause of downtime than the UPS running down? Even after him saying they had to move it across the room *quickly* ?

    91. Re:Uptime by chimpo13 · · Score: 1

      I'm actually working in Saudi. Just above the Yemen border. It's in my own best interest to pay attention to what's going on. This city was shelled a year ago and in December there was a firefight with al-Qaeda.

    92. Re:Uptime by chimpo13 · · Score: 1

      I think Saudi will be fine. There's no taxes here and everything is subsidized. The people are well fed. Still, it's the ME so any monetary bet on it wouldn't be make or break.

    93. Re:Uptime by drinkypoo · · Score: 1

      There's a reason my MP3 player and netbook both use flash.

      Yes, battery life. Seriously, these days they build accelerometers into everything, it costs a few cents.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    94. Re:Uptime by Belial6 · · Score: 1

      If you are having that high of a failure rate on brand new drives, I would suggest that you should either treat the drive better on the way home, or choose a different store to buy your drives from. A 20% defective rate is not normal.

    95. Re:Uptime by tinkerghost · · Score: 1

      Did a PC on Dell's site. The 17" LCD/Keyboard/Monitor setup that's closest to what I used to use runs about $1200. It's a 1U drawer that holds a flip-up monitor - essentially a laptop with an 8 port KVM instead of a Motherboard. Downside is it's VGA not DVI.

      None of the rack mounts w/ monitor/keyboard support DVI, so you would have to get the KVM and a rack mount keyboard/monitor to support it. The 4 port DVI KVMs start at around $200 and push upwards of $600 for extra features - >$1k for IP connectivity.

    96. Re:Uptime by ReedYoung · · Score: 1
      Brutal trauma can have cumulative effects. I appreciate that you're being humorous and not really making sweeping generalizations, but still ...

      In my completely objective all encompassing experience :) I've had more computers just stop working one day for no apparent reason than because of the brutal trauma I tend to give them.

      ... I have to say, just because it didn't break immediately doesn't mean the abuse isn't what broke it.

      Anyway, my point is computers are much sturdier than people think... I once had a computer with a hard drive that had a cracked case. It was a tower and stopped booting up. I found that if I laid it on it's side, turned it on, and then stood it upright after it started booting it worked fine. My assumption was that upright, everything in the HDD didn't line up right. On it's side it did. I imagine that the gyroscopic nature of the platters in the hard drives kept them in place while I stood it upright again.

      I would consider that more a tribute to your ingenuity and knowledge of the Newtonian mechanics involved, than evidence that the hardware itself is "much sturdier than people think." An expert auto mechanic can squeeze a couple 10,000 more miles out of a rickety old jalopy. That doesn't mean the vehicle is now sturdier than previously believed. It just means some mechanics are better than others.

      Yeah, you can get more usage out of the same hardware than most people think is possible, but it's because of your skill, not because of the high quality of the hardware. That really is carelessly mass-produced by Chinese slave labor. The tip of the hat rightfully goes to you, not to WD, Seagate, Maxtor or whatever.

      --
      "I can't imagine how things could get any worse!" (some guy) "That could just be failure of imaginatioÂn on your p
    97. Re:Uptime by Belial6 · · Score: 1

      "Stupid" idea? No. Laptops are designed to be portable. They are sold as portable devices. They are marketed to people that the designers KNOW will move them while they are on. Laptops being so delicate that break because they were moved while on have a design defect. Blaming the user for moving their portable device is no better than telling them that they are holding their phone wrong.

      Any problems associated with moving a laptop while on is a design failure. Not a user error. It would be trivial to force the laptop to completely shut down if it is moved while on. Not one manufacturer does this because they know that they users would not accept this. Why? Because the user is buying a product that can be moved while on. It is the manufacturers who are agreeing to sell the users what they want, and then failing to design a product that meets the known intended use.

    98. Re:Uptime by X0563511 · · Score: 1

      Most notebooks use drives designed for notebooks. Drives that have shorter, more rigid platters, shorter armatures etc...

      --
      For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...
    99. Re:Uptime by X0563511 · · Score: 1

      I've also noticed that the platters tend to be a bit thicker too, meaning they flex less to begin with. The armatures are also shorter because of the platter size, and seem to be designed to not flex as much either. The whole thing's been engineered for the purpose... imagine that!

      --
      For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...
    100. Re:Uptime by antdude · · Score: 1

      Way out of my budget. :( Any cheaper ones?

      --
      Ant(Dude) @ Quality Foraged Links (AQFL.net) & The Ant Farm (antfarm.ma.cx / antfarm.home.dhs.org).
    101. Re:Uptime by Racemaniac · · Score: 1

      hd's have a safety for that, they shut themselves down i think in such cases
      i've gotten to know about that since on my laptop there was some issue with that feature that caused my windows to BSOD when i moved my laptop around :D

    102. Re:Uptime by Anonymous Coward · · Score: 0

      That has happened in so many places it half makes me wonder if Apple was paying IT contractors to be incompetent. The number of people I know that had shitty IT implementations of XP and/or Vista on hardware that was old in 1998 could fill a phone book. Most of them have sworn off Windows forever, and are now using OSX. Then again, most of them are somewhat upset with Apple over the switch to Intel, because they all jumped on the PPC bandwagon. With the recent WoW patch that removes PPC support, some of them have started to consider buying a cheap Windows machine instead of another expensive Apple box.

    103. Re:Uptime by anyGould · · Score: 1

      Uptime is like a winning streak - nice while you have it, mourned when broken, but not worth getting too worked up about.

      (Although I do miss my old 486/25 running Linux - I think I broke a year of uptime, simply because it just never stopped working.)

    104. Re:Uptime by mistiry · · Score: 1

      universal power supply

      Uh, chimpo...it's 'Uninterruptable Power Supply'...

  2. Persistent myth? by 6031769 · · Score: 5, Interesting

    This is not a myth I had heard before. In fact, none of the *nix sysadmins I know would dream of rebooting the box to clear a problem except as a last resort. Where has this come from?

    --
    Burns: We're building a casino!
    McAllister: Arrr. Give me 5 minutes.
    1. Re:Persistent myth? by SCHecklerX · · Score: 4, Informative

      Windoze admins who are now in charge of linux boxen. I'm now cleaning up after a bunch of them at my new job, *sigh*

      - root logins everywhere
      - passwords stored in the clear in ldap (WTF??)
      - require https over http to devices, yet still have telnet access enabled.
      - set up sudo ... to allow everyone to do everything
      - iptables rulesets that allow all outbound from all systems. Allow ICMP everywhere, etc.

    2. Re:Persistent myth? by hedwards · · Score: 1

      People who don't know any better. On Windows systems sometimes the system gets so that there's a bit of corrupted memory that prevents a program from running correctly if the computer isn't completely shut off and let to sit for a few seconds before being turned back on. I was personally skeptical until I saw that work for myself. I still don't really understand why that's the case, but IIRC it had to do with some errors you could run into with Autocad.

      I'm not familiar with Unix itself enough to comment, but with both Linux and *BSD you're able to start and restart services without a reboot and the architecture is such that you're much less likely to end up in a situation where you can't perform whatever action you need to in order to clear the error manually. I'm not sure how I would even go about looking up how to do a lot of that with a Windows box.

    3. Re:Persistent myth? by afabbro · · Score: 5, Insightful

      This is not a myth I had heard before.

      +1. This article should be held up as a perfect example of building a strawman.

      "It's a persistent myth that some natural phenomena travel faster than the speed of light, but at least one physicist says it's impossible..."

      "It's a persistent myth that calling free() after malloc() is unnecessary, but some software engineers disagree..."

      "It's a persistent myth that only the beating of tom-toms restores the sun after an eclipse. But is that really true?"

      --
      Advice: on VPS providers
    4. Re:Persistent myth? by Anonymous Coward · · Score: 0

      On my server the BIOS battery is dead for the last one or two years. Rebooting it means BIOS gets fucked up and it can't boot and needs manual intervention. Otherwise it has been quite stable for the last 3 years. Next reboot for that server is probably never going to happen - it will run until it dies and/or is replaced. Most likely destination is the scrap heap. Heck, it only had a handful of reboots in the last 7 years of operation...

      Secondly, rebooting a well written OS generally does *nothing*.

    5. Re:Persistent myth? by trybywrench · · Score: 1

      I came here to say the same thing, I've never thought to reboot a unix box to fix a problem. In fact, in the face of a serious operating system issue I want to do everything I can do to avoid the temporary purgatory that is a reboot.

      --
      I came to the datacenter drunk with a fake ID, don't you want to be just like me?
    6. Re:Persistent myth? by starfishsystems · · Score: 0

      The myth - as with so many others, it seems to me - arises from relatively junior people bringing their unquestioned practices and prejudices in from the Microsoft world.

      I see it all the time. Often it's a "reach for the GUI" reflex toward making something work. A Unix veteran would look for a config file, would save the original version of the file before experimentally investigating, would then restore the original file if the investigation came up empty. A Windows veteran would simply click on things in the application to see if it made any difference, and after whatever success or failure emerges from the experiment, would then walk away, having tracelessly changed that application from its original state.

      --
      Parity: What to do when the weekend comes.
    7. Re:Persistent myth? by arth1 · · Score: 4, Insightful

      Don't forget 777 and 666 permissions all over the place, and SELinux and iptables disabled.

      As for "ALL(ALL) ALL" entries in sudoers, Ubuntu, I hate you for ruining an entire generation of linux users by aping Windows privacy escalations by abusing sudo. Learn to use groups, setfattr and setuid/setgid properly, leave admin commands to administrators, and you won't need sudo.

      find /home/* -user 0 -print

      If this returns ANY files, you've almost certainly abused sudo and run root commands in the context of a user - a serious security blunder in itself.

    8. Re:Persistent myth? by Anonymous Coward · · Score: 0

      Windoze admins who are now in charge of linux boxen. I'm now cleaning up after a bunch of them at my new job, *sigh*

      "Windoze" ... "boxen" ... Your poor employer; somehow I feel like things aren't going to be improving much. Seriously, as a Unix person myself, you're embarrassing me. Makes me feel the need to throw out this disclaimer: Most Unix admins and programmers are actually not nearly as immature as this person!

    9. Re:Persistent myth? by arth1 · · Score: 2

      Unfortunately, the GUI-befuddled people cause problems even on distro levels. Perfectly serviceable text configuration files give way to humongous xml files, or even databases without a plain text front end.
      This makes administration a real pain, and adds nothing except catering to the point-and-drool generation.

    10. Re:Persistent myth? by Ephemeriis · · Score: 1

      This is not a myth I had heard before. In fact, none of the *nix sysadmins I know would dream of rebooting the box to clear a problem except as a last resort. Where has this come from?

      The idea that you ought to just reboot to fix things comes from the Windows world.

      I've got several Windows servers that absolutely have to be rebooted nightly to keep them running happily. This isn't because I'm some crappy admin or anything like that... Rather, the software running on them just isn't stable. It's actually the vendor's suggestion that these servers be rebooted nightly. Not that particular services need to be restarted - but that the entire box should be rebooted.

      I'm not entirely sure what the problem is... Corrupt data in RAM? Memory leaks? Files not closing right? Whatever. They need to be rebooted, or they become cranky.

      I'm OK with that. It's what the vendor recommends. It's what we do for those boxes. It generally works.

      But we've also got a few Linux boxes... And we do not reboot those when things go wrong. We've got Linux boxes that've been up and running for years. If something goes wrong on one of our Linux boxes, it's probably because somebody screwed up a config, or an update went awry, or a bit of hardware is failing.

      When something breaks on a Linux box, and we call support, the answer has never been "reboot it". They always want to see what is going on in the system as-is, and they've always been able to fix the issue.

      I haven't personally been bit by rebooting a Linux box and making everything worse... But I've seen enough other people get bit, and I've read enough horror stories on-line.

      --
      "Work is the curse of the drinking classes." -Oscar Wilde
    11. Re:Persistent myth? by ByOhTek · · Score: 1

      I wouldn't read too much into it. From what I can tell the author is a idiot. He knows some stuff, probably to an impressive extent even, but he's too arrogant and one-size-fits-all.

      I don't know of any Unix admin who reboots early-on. Even the few I know (myself included) who came over from windows (or still admin it).

      --
      Self proclaimed typo king, and inventor of the bear destroying coffee table (patent not pending).
    12. Re:Persistent myth? by Dracos · · Score: 3, Funny

      I'm not familiar with Unix itself enough to comment, but with both Linux and *BSD...

      I'm not sure how to respond to that.

    13. Re:Persistent myth? by Anonymous Coward · · Score: 0

      Most unix systems I have been associated with even had /etc/ checked into a revision control system to track who changed what and when.

    14. Re:Persistent myth? by vux984 · · Score: 1

      In fact, none of the *nix sysadmins I know would dream of rebooting the box to clear a problem except as a last resort. Where has this come from?

      I take a somewhat contrary stance, rebooting is like testing the backup recovery procedure, or the backup power system... you have to do it to know that you can do it.

      If you are a afraid to reboot your server when its working fine because you don't know it will come back up, then you ALREADY HAVE A PROBLEM.

      That said, I fully understand the desire not to reboot especially if it may take down a production server and cause downtime... but if uptime is that critical you should already have a backup system ready to go.

      There are absolutely business situations and scenarios where a 'reboot as a last resort' is the right approach. But for a lot of people... probably the majority of them, rebooting from time to time especially in controlled circumstances makes some sense.

      If you've got dodgy hardware that might fail on a reboot olr some other boot sequence problem... its generally better to find out about it under controlled circumstances, rather than in the midst of some other data corruption/service won't start/catastrophe... the last thing you need while fighting a server problem is to have to resort to a reboot and find out your drive controller is toast too... or that some twit mangled /boot.

    15. Re:Persistent myth? by Anonymous Coward · · Score: 0

      Where has this come from?

      I believe it may come from buggy beta firmware for unsupported bleeding edge hardware that you are presently working with the engineers on. You know! Those times when even SysRq laughs at any command you throw at it!

      These games are not for the weak of spirit...

    16. Re:Persistent myth? by idontgno · · Score: 1

      even databases without a plain text front end.

      AIX, I'm looking at you. I haven't had to admin AIX since 5.3 days, but while our team was learning AIX (coming from Solaris) we would modify system configuration by editing the /etc config files like God intended. And they'd keep reverting to pre-edit config if we had to reboot. Which happened a lot because we had some flaky hardware.

      It took the local IBM Customer Engineer weenie telling us about "SMIT" and the AIX ODB to realize we weren't editing the real system config... just the text file created from the ODB when AIX was booted.

      Damn SMIT. It's blasphemy to alter the system's config with anything besides "vi".

      --
      Welcome to the Panopticon. Used to be a prison, now it's your home.
    17. Re:Persistent myth? by Tordek · · Score: 0

      I'll give you "Windoze", but you're really complaining about boxen?

      --
      Tordek, Dwarven Warrior - Juegos de Rol en Argentina
    18. Re:Persistent myth? by amorsen · · Score: 1

      IIRC it was reasonably common advice in Unix books from the 80's (I can't provide citations because I borrowed from the library). Reboot at least weekly with a full fsck. Supposedly file systems weren't as stable back then.

      Since the myth was well and truly dead by the time I managed to touch a Unix box for the first time (1993), it seems a bit late to try to kill it.

      --
      Finally! A year of moderation! Ready for 2019?
    19. Re:Persistent myth? by pugugly · · Score: 1

      I tend to disagree - Ubuntu is designed (in large part) for the end user, and for that class where the admin == main user, sudo is a good idea, enforcing the separation of privileges but allowing reasonable usage.

      I've noticed a few article lately about how 'real men' login as root at all times, but I've worked in Unix/Linux since the 90's, and this seems to be a recent phenomena.

      Pug

      --
      An Invisible Entity of Vast Power whose existence must be taken on faith alone: Liberal Media
    20. Re:Persistent myth? by Zencyde · · Score: 1

      I'm not sure if this is a Windows thing. I've seen plenty of people go to flip a switch in attempt to turn on the lights, fail, and go to the next switch. The issue is that they don't toggle things back to their initial state. It's simply poor systems practice. In this case, you're liable to find the lights on in another part of the house and waste some electricity. In some more dire scenarios, say when a gas system is turned on, it could result in death. It's just dumb to do and people sometimes don't think through situations as thoroughly as they should. But I don't think you could blame this action on Microsoft or Windows.

      --
      What day is it? Could you please tell me?
    21. Re:Persistent myth? by Captain+Centropyge · · Score: 1

      You realize they make replacement batteries, right...?

      --
      Bite my shiny metal ass!
    22. Re:Persistent myth? by pugugly · · Score: 1

      Actually, this guys *last* article hit Slashdot too, and I was nearly as unimpressed with that one as this one.

      Pug

      --
      An Invisible Entity of Vast Power whose existence must be taken on faith alone: Liberal Media
    23. Re:Persistent myth? by arth1 · · Score: 2

      I've noticed a few article lately about how 'real men' login as root at all times

      No, they don't. They only do that when they need it, and have configured their systems so they rarely need it.

    24. Re:Persistent myth? by vlm · · Score: 1

      On my server the BIOS battery is dead for the last one or two years. Rebooting it means BIOS gets fucked up and it can't boot and needs manual intervention.

      I have a Soekris 5501 that won't boot unless I attach a RS-232 terminal. Not sure if its stuck at the BIOS level or the GRUB level. It draws about an amp, and is hooked up to a 75 amp-hour 12V battery system. It has never crashed, not in several continuous years of operation as an asterisk PBX, so I'm not overly worried, although its an unholy pain to haul out and hook up an old Wyse terminal at the location every time I reboot.

      I believe I recall some HP or sun boxes about 20 years ago that wouldn't reboot unless they saw the correct RS-232 signals on the console, this is not a new problem.

      --
      "Science flies us to the moon. Religion flies us into buildings." - Victor Stenger
    25. Re:Persistent myth? by Nostrada · · Score: 1

      You brought back memories I'd rather not re-visit. SMIT. Has it really been 20 years already. RIP!

      --
      Cheers, Nostrada
    26. Re:Persistent myth? by SudoGhost · · Score: 1

      That sounds like FUD, but just to be safe....where can I buy some tom-toms?

    27. Re:Persistent myth? by maxume · · Score: 1

      It seems like being reluctant to reboot is just another type of broken.

      --
      Nerd rage is the funniest rage.
    28. Re:Persistent myth? by Lumpy · · Score: 1

      It's a failure of the house design if a switch in the room turns on a light 4 rooms away or on a different floor.

      --
      Do not look at laser with remaining good eye.
    29. Re:Persistent myth? by Enigma23 · · Score: 1

      On my server the BIOS battery is dead for the last one or two years... ...Next reboot for that server is probably never going to happen - it will run until it dies and/or is replaced. Most likely destination is the scrap heap. Heck, it only had a handful of reboots in the last 7 years of operation...

      Or you could just replace the battery for the BIOS the next time you have to bring the system down? Then you can reflash the BIOS when it comes back up again and run it on for another 2 or 3 years quite happily...

      --
      Ceci n'est pas une .sig
    30. Re:Persistent myth? by kmdrtako · · Score: 0

      Going the other direction, I can't count the number of times I've seen clueless software devs writing software for Windoze -- usually former Unix devs -- who use double backslashes in their pathnames, e.g.: ...
          #if !defined(WIN32)
          FILE* fp = fopen("/path/to/a/directory/filename"...);
          #else
          FILE* fp = fopen("\\path\\to\\a\\directory\\filename", ...);
          #endif ..

      (Not to mention the Java System.file.separator being "\\" on Windows.)

      And before some twit tries to claim that those are actually not incorrect, let me remind you that command.com and cmd.exe are not the C/POSIX APIs and programming at the C library level has allowed the '/' path separator since DOS 2.0 (and probably even DOS 1.0 too. And yes, either one works, but '\\' is not necessary and it's a POS pattern that too many people follow because they don't or can't read the docs.)

    31. Re:Persistent myth? by SkOink · · Score: 1

      For the record, in the embedded world it's pretty common to see malloc() used without free(). The use case is providing pointers with memory blocks during program initialization.

      --
      ---- I'll take you in a Hunt deathmatch any day.
    32. Re:Persistent myth? by thsths · · Score: 1

      How is beating my TomTom going to help with a solar eclipse?

    33. Re:Persistent myth? by peragrin · · Score: 1

      Not really. you may want to turn on the upstairs hallway from the Dining room, kitchen, etc.

      Outside lights are also commonly setup that way. multiple switches from multiple locations.

      --
      i thought once I was found, but it was only a dream.
    34. Re:Persistent myth? by sirsnork · · Score: 1

      When has Windows ever stored cleartect passwords in AD?

      For that matter Telnet has been disabled by default since 2000, thats a decade ago.

      Finally, name one ofther OS thats local firewall policy after install is to block outgoing connections?

      --

      Normal people worry me!
    35. Re:Persistent myth? by ByOhTek · · Score: 2

      That sounds like horrible software.

      Thinking of the Windows servers I admin and used be an assistant admin for - we usually used reboots only after a large number of other diagnostics were tried. For our desktop users, yes, we said reboot first - but anything on the server should be stable enough as to not need a reboot.

      Actually, I am to blame for the one Windows server restart at my last job that wasn't due to a patch that required it. Long day, logged onto the backup domain controller and accidentally restarted it instead of logged out (yes, it has that extra popup. it was a LONG day).

      But yeah, I think the rebooting first with a server is in general just bad administration, regardless of OS.

      --
      Self proclaimed typo king, and inventor of the bear destroying coffee table (patent not pending).
    36. Re:Persistent myth? by wertigon · · Score: 1

      You are aware that BSD is a Unix flavor, yes?

      --
      systemd is not an init system. It's a GNU replacement.
    37. Re:Persistent myth? by mini+me · · Score: 2, Interesting

      He is quite correct in his assertion that Linux and BSD are not Unix. Without experience with real Unix systems, it would be impossible for him to verify that they exhibit the same behaviour. However, Mac OS X is Unix. I find it hard to believe that someone posting on Slashdot has not at least spent some time evaluating OS X, even if they ultimately decided it was not for them.

    38. Re:Persistent myth? by Zencyde · · Score: 1

      Somewhat often the switch controls a light on the opposite side of the wall. Usually while entering a room with no windows or turning on a light outside. Though I will admit to having lived in a house with the most confusing electrical system I've ever encountered. A fuse blowing in an unrelated room would take out an entire wall of a room that's not connected. But only a single wall.

      Either way, there are plenty of example cases where one can flip a switch that doesn't turn on lights that can be seen from the switch. Also, a switch could turn on power to a socket. Or perhaps break power on a socket, which has great potential to cause problems.

      --
      What day is it? Could you please tell me?
    39. Re:Persistent myth? by peragrin · · Score: 1

      Windows as an OS has a small memory management problem. Something doesn't release right when it should. This is small only happens a small percentage of the time but is constant. Maybe some weird swap/ram bug.

      It isn't hardware as all windows boxes have the same problem.

      --
      i thought once I was found, but it was only a dream.
    40. Re:Persistent myth? by c6gunner · · Score: 1

      The idea that you ought to just reboot to fix things comes from the Windows world.

      From the rest of your comment, it seems more like an idea which comes from crappy-software world.

      Starting in 2000, I had a win2k workstation running as a home server which eventually reached a 3 year uptime. Meanwhile, my current Ubuntu laptop needs to be rebooted every couple days thanks to the insane memory usage of firefox, and memory leaks caused by the graphics driver. I don't blame "linux" for my problems; it would be just as silly as blaming "windows" for your issues.

    41. Re:Persistent myth? by trollertron3000 · · Score: 1

      They don't need to use Windows to be an idiot.

      --
      Tiger Blooded Bi-Winning Machine
    42. Re:Persistent myth? by jdgeorge · · Score: 1

      Ummm... so you would seriously describe something you mount in one of these, or a virtual machine running hosted by a rack-mounted server, blade, laptop, etc, as a "box"?

      However, maybe regarding the dorky pluralization of the word "box", you have a point. Perhaps this is laziness; there is a reduction in effort in typing the letter "n", close to the right hand index finger, compared to the letter "s", way over there under the left hand "ring finger".... Or.... Well, maybe this only applies to people who type with two fingers.

      Okay, I give up. "Boxen" is just silly. Stop it.

    43. Re:Persistent myth? by _Sprocket_ · · Score: 1

      I've noticed a few article lately about how 'real men' login as root at all times, but I've worked in Unix/Linux since the 90's, and this seems to be a recent phenomena.

      I saw one article with this sentiment and I thought the guy was an idiot - or playing one for comedic effect. :P

    44. Re:Persistent myth? by Joe+U · · Score: 1

      Let's assume you have two people competing for a job. One person is using terms like 'management across Linux and Windows systems'. You have another person, talking about 'running the Windoze and Linux Boxen'.

      Best first impression goes to who?

    45. Re:Persistent myth? by Rakshasa+Taisab · · Score: 1

      To say Linux and BSD are not UNIX is like saying humans are not primates...

      Something about religious beliefs or some shit, don't ask me.

      --
      - These characters were randomly selected.
    46. Re:Persistent myth? by Anonymous Coward · · Score: 0

      Windoze admins who are now in charge of linux boxen. I'm now cleaning up after a bunch of them at my new job, *sigh*

      - root logins everywhere
      - passwords stored in the clear in ldap (WTF??)
      - require https over http to devices, yet still have telnet access enabled.
      - set up sudo ... to allow everyone to do everything
      - iptables rulesets that allow all outbound from all systems. Allow ICMP everywhere, etc.

      mmmm I think we need the kill switch after all :D

    47. Re:Persistent myth? by monkyyy · · Score: 0

      "I've noticed a few article lately about how 'real men' login as root at all times"

      the future in linux tech support is bright, hard but consent work in the near future

      --
      warning pointless sig
    48. Re:Persistent myth? by gravis777 · · Score: 1

      I'm no server admin by any depth of the imagination, but even I know a few of these. Seriously, why are you in networking if you do not understand closing up unused services and ports? I had a test OSX server at my last job I was at. Pretty much closed all ports except for 80 and whatever port Apple RDP ran on (actually, I closed that down at first, oops, had take a macbook into the server room to fix that), then opened up ports as needed. Seriously, if they are leaving things like Telnet open, they shouldn't be in IT, let alone be a server admin.

      BTW, why shouldn't you use sudo? My understanding is that was perferable over logging in as root?

      And in response to another post, I've done a few 777s, but I don't think I would be stupid enough to do it on a server.

    49. Re:Persistent myth? by _Sprocket_ · · Score: 1

      Ahhh. He's the idiot that started off calling sudo "like bowling with only the inflatable bumpers in the gutters."

    50. Re:Persistent myth? by RocketRabbit · · Score: 2

      The BSD variants are descended from the original Berkeley Unix codebase, which is simply an enhanced form of original ATT Unix. BSD is Uinx. However, I think that of the BSD variants in use today, only Apple has had theirs certified by the Open Group, which makes it not just Unix but Unix(tm).

    51. Re:Persistent myth? by sosume · · Score: 1

      "People who don't know any better. On Windows systems sometimes the system gets so that there's a bit of corrupted memory that prevents a program from running correctly if the computer isn't completely shut off and let to sit for a few seconds before being turned back on."

      Shouldn't you call that a hardware issue? Smells like a troll ..

    52. Re:Persistent myth? by zill · · Score: 1

      Linux and BSD conforms to the Single UNIX Specification, but they are not officially certified due to cost issues.

      The car analogy would be that I built my own car in the garage, but since I can't afford to build 5 more of them and send them to NHTSA for crash testing, the government cannot register the vehicle. So I can only use it on my private property. For all intents and purposes it's still a car; it's simply not registered as a car.

    53. Re:Persistent myth? by Yvan256 · · Score: 1

      Well, I'm pretty sure you must have done at least one 777 to log into Slashdot.

    54. Re:Persistent myth? by Glendale2x · · Score: 1

      - passwords stored in the clear in ldap (WTF??)

      Some things require this. 802.1x with PEAP and MSCHAPv2 is one that comes to mind. When forced to do this, just run the LDAP server alone on its own server/VM so that nothing can get to it.

      - set up sudo ... to allow everyone to do everything

      I do this (on a per-user basis) because when I do a root action I want it to be intentional. I normally work as an unprivileged user and perform the few required root actions via sudo. There's no point in restricting sudo commands to anyone who already knows the root password, but I believe it is a good habit to work as a user, not as root. (For people who don't know the root password, sure, restrict away to only commands they need to do their job.)

      --
      this is my sig
    55. Re:Persistent myth? by Anonymous Coward · · Score: 0

      Latter. Neither sounds very good, but "management across foo" sounds too much like an excerpt from buzzword-filled nonspeech.

    56. Re:Persistent myth? by Waffle+Iron · · Score: 3, Informative

      And yes, either one works, but '\\' is not necessary and it's a POS pattern that too many people follow because they don't or can't read the docs.)

      Here's a snippet from Microsoft's own current MSDN example on the PathMatchSpec() API call:

      ...
      void main(void)
      {
      // String path name 1.
      char buffer_1[ ] = "C:\\Test\\File.txt";
      char *lpStr1;
      lpStr1 = buffer_1;
      ...

      Gee, I wonder where these people get their path separator ideas? Maybe it's because they *did* read the docs.

    57. Re:Persistent myth? by rgviza · · Score: 1

      memory leaks... windows developers were taught for years that garbage collection works. In fact it was a big selling point of VB6.

      It doesn't, and not all developers realize this, so some developers don't explicitly kill objects. Maybe they all do by now and they're just sloppy.

      It's Too Hard(tm) to find all this stuff so the vendors tell you to reboot instead of spending the money necessary to fix their software.

      Even the mighty linux can have memory leaks if the developers suck.

      FWIW, I've never seen a garbage collector that actually worked 100%. We'd be better off writing good clean code instead of relying on them.

      All of that being said, even on windows you can usually kill a process and reclaim the memory. Reboots are only necessary when you have a service or driver running that gets into an unkillable state. I've seen this happen with IIS and bad NIC drivers.

      I've never seen software on a linux box get into the state where you couldn't kill it so until I do, I say linux never needs to be rebooted, except to replace hardware. Even in a failover situation, you can simply unplug the network cable to simulate an outage. No need to bounce the box...

      However, prior to 2.6.24 kernel, linux memory manager was still being fixed and memory was subject to excessive fragmentation. Some drivers as well as other progams require contiguous pages of memory (network drivers being a prime example) and if the memory was too fragmented you could run into situations where you'd go to restart network and the NIC wouldn't come back up because there wasn't enough contiguous space to restart the driver.

      See http://kernelnewbies.org/Linux_2_6_24 section 2.4 for details. There actually used to be an occasional intermittent need to reboot linux in this limited case but it was rare to need to do this and you could usually get around it by freeing up memory and restarting the troublesome driver, after which you'd bring up the other stuff running on the box. /shrug

      --
      Don't kid yourself. It's the size of the regexp AND how you use it that counts.
    58. Re:Persistent myth? by sconeu · · Score: 1

      my current Ubuntu laptop needs to be rebooted every couple days thanks to the insane memory usage of firefox, and memory leaks caused by the graphics driver

      Don't shut down the whole system, just

      1. Shut down FF periodically.
      2. If the graphics driver has leaks, restart the X server.

      --
      General Relativity: Space-time tells matter where to go; Matter tells space-time what shape to be.
    59. Re:Persistent myth? by Reapman · · Score: 1

      Well, I imagine people don't make posts on Slashdot pretending they're competing for a job that doesn't exist. Some of us enjoy the fact we're not forced into use proper management-speak on a site like this, and even may use use slang terms.

      Let's assume you have two people competing for a job. One person listens to what the interviewer says, the other person is correcting the interviewer on proper use of lingo.

      Best first impression goes to who?

      BTW if I was the interviewer, I'd go with the guy that actually KNEW the most and seemed to be a good fit for the team, not the guy that uses pretty words. People that feel the need to correct other people's grammer would probably NOT fit the team.

    60. Re:Persistent myth? by Anonymous Coward · · Score: 0

      - iptables rulesets that allow all outbound from all systems. Allow ICMP everywhere, etc.

      Dude, don't block ICMP packets. You're breaking the internet.

    61. Re:Persistent myth? by jdgeorge · · Score: 1

      Gah! My snarkiness fails when I leave out the link to "one of these".

      You get the point, though.

    62. Re:Persistent myth? by Cytotoxic · · Score: 1

      He is quite correct in his assertion that Linux and BSD are not Unix. Without experience with real Unix systems, it would be impossible for him to verify that they exhibit the same behaviour. However, Mac OS X is Unix. I find it hard to believe that someone posting on Slashdot has not at least spent some time evaluating OS X, even if they ultimately decided it was not for them.

      BSD is not unix.... but OSX is unix? FAIL.

      BSD is direct descendant of AT&T UNIX, sharing the initial codebase.

      OSX is a direct descendant of NEXTSTEP/OPENSTEP - which is a direct descendant of BSD.

    63. Re:Persistent myth? by Smauler · · Score: 1

      On my windows system I've never seen that. Running countless programs, running games pushing my graphics card and CPU into 70 degrees plus temperatures (I know that's not impressive, I've got better cooling than I used to have), visiting lot of dodgy websites, I've currently got uptime of a month or so. This system is about 3 years old, with a new graphics card 6 months ago or so. I just leave it on in the winter... in the summer, I object to it heating my house when I'm not there.

      Uptime is not an achievement for home computers, especially when they can boot to desktop in 20 seconds or so.

    64. Re:Persistent myth? by value_added · · Score: 1
      Your comments may be correct, but file paths (in general) have always been a problem in Windows, and the problem still exists today. If a *nix programmer has to do anything on Windows, getting into the habit of escaping the "escape character" is the first step to becoming productive, and pre-requisite for learning the finer points of the voodoo quoting necesary on WIndows. A trivial example I have handy:

      [HKEY_CLASSES_ROOT\mboxfile\shell\open\command]
      @="c:\\cygwin\\bin\\run.exe bash --login -c \"rxvt -e mutt -f \\\"`cygpath -u '%1'`\\\"\""

      I've seen and written much worse. The fact that in recent years some Windows commands accept a forward slash (in place of a backslash) in certain limited contexts is hardly a consolation for this nonsense.

      I'd suggest cutting the *nix programmers some slack.

    65. Re:Persistent myth? by Halo- · · Score: 1

      Going the other direction, I can't count the number of times I've seen clueless software devs writing software for Windoze -- usually former Unix devs -- who use double backslashes in their pathnames, e.g.: ...
      #if !defined(WIN32)
      FILE* fp = fopen("/path/to/a/directory/filename"...);
      #else
      FILE* fp = fopen("\\path\\to\\a\\directory\\filename", ...);
      #endif ..

      (Not to mention the Java System.file.separator being "\\" on Windows.)

      And before some twit tries to claim that those are actually not incorrect, let me remind you that command.com and cmd.exe are not the C/POSIX APIs and programming at the C library level has allowed the '/' path separator since DOS 2.0 (and probably even DOS 1.0 too. And yes, either one works, but '\\' is not necessary and it's a POS pattern that too many people follow because they don't or can't read the docs.)

      I'd argue that have a variable which might end up being used in a string context not being properly escaped is a pretty sizable mistake.
      For example:

      const char* filename = "/wow/windows/lets/me/fopen/without/escaping/this";
      #if !defined(WIN32) FILE* fp = fopen(filename,...);
      if(fp == NULL) {
      fprintf(stderr,"Failed to open %s\n",filename); // ut-oh...
      }
      ...
      #endif

    66. Re:Persistent myth? by Anonymous Coward · · Score: 0

      BSD is derived from the original AT&T UNIX sources. It is as much UNIX as anything else in use today - if BSD isn't UNIX then OSX is even less so.

    67. Re:Persistent myth? by Xest · · Score: 1

      I see, so they manage Linux boxes but are Windows admins?

      The root logins everywhere issue is not a Windows thing as any good Windows admin knows not to use the Administrator account, similarly Windows doesn't store passwords in the clear either, so again, that's not something they've learnt from Windows.

      The problem isn't Linux or Windows admins, that's just the correlation you've chosen to blame, no, the problem is just plain old shity incompetent staff, and they occur in all walks off life, and yes, even in Linux/Unix only environments.

    68. Re:Persistent myth? by Smauler · · Score: 1

      Really? How long does this take to show itself? Becauce I've run Windows systems for months with no degredation in performance.

    69. Re:Persistent myth? by Scarletdown · · Score: 1

      It's a failure of the house design if a switch in the room turns on a light 4 rooms away or on a different floor.

      Reminds me of a Steven Wright joke.

      I have a light switch in my house that doesn't turn anything on and off. Every once in a while, I would just go ahead and flip it up and down a few times. One day I got a letter from a lady in Germany that said "knock it off".

      --
      This space unintentionally left blank.
    70. Re:Persistent myth? by element-o.p. · · Score: 4, Interesting

      As for "ALL(ALL) ALL" entries in sudoers, Ubuntu, I hate you for ruining an entire generation of linux users by aping Windows privacy escalations by abusing sudo.

      Yeah, I agree with you in principle, although to be fair, there really isn't a way that Ubuntu could know what user account you are going to set up before you actually set it up, and therefore, there isn't really a way for Ubuntu to create an appropriate sudoers entry to give admin privileges to the server admin.

      Learn to use groups, setfattr...properly...

      Okay, agreed...

      Learn to use...setuid/setgid properly...

      Ugh...setuid and setgid, IMHO, should be used as little as possible. If there's a security hole in your app, then having it setuid/setgid allows a sufficiently skilled user the ability to gain elevated privileges. I'd much prefer to use sudoers to give access to specific apps to people I trust than give any user access to an app I "trust" through setuid/setgid.

      ...leave admin commands to administrators, and you won't need sudo.

      Maybe I'm just missing something, but that sounds really stupid to me. While I'm a reasonably skilled Linux admin, I don't pretend to know everything, and maybe you can teach me something I've missed in my experience so far. If so, cool. But from my perspective, sudo is an ideal tool for granting appropriate permissions as required to trusted individuals. Sudo logs the user name and command in the log files, so if someone is abusing sudo, you know. Sudo can e-mail failures to admin staff, so if someone is habitually trying to exceed their permissions, you know. Sudo allows pretty fine-grained access to users based upon group or user name, so you can easily allocate permissions as required (well, relatively easily, anyway) -- much more fine-grained than Unix User/Group/Other permissions would allow. For example, with sudo you could allow senior admins (group: admin) and web developers (group: www-dev) read/write permissions to CGI script directories, junior admins (group: jadmin) read-only permissions and all other users (group: users) no access. Uh-oh...we've got four groups here: admins, jadmins, www-dev and users, so doing that with standard Unix permissions is going to be kind of difficult (admins could be members of the www-dev group I suppose, but I can imagine cases where group A might need permissions to a subset of files that group B owns, but shouldn't have access to another subset, which would really complicate things). Sudo is a powerful tool, and just like all the other tools you mentioned, should be used appropriately as a component of overall system security.

      find /home/* -user 0 -print

      If this returns ANY files, you've almost certainly abused sudo and run root commands in the context of a user - a serious security blunder in itself.

      Maybe. I see what you are saying, but as a counter-example, I sometimes run tcpdump from within my home directory when troubleshooting problems. tcpdump has to run as superuser, and I have a lot more faith in giving myself and other admins permission to run "sudo tcpdump" than running tcpdump setuid 0. Again, maybe I'm just missing something, but I really don't have a huge problem with tcpdump (or other admin tools) writing UID 0 data to an admin user's home directory.

      --
      MCSE? No, sir...I don't do Windows. Yes, I am an idealist. What's your point?
    71. Re:Persistent myth? by Creepy · · Score: 2

      Linux and many BSD flavors are not UNIX - UNIX is a trademark of the Open Group, and anyone that doesn't pay for certification and licensing of the trademark from the Open Group cannot call themselves UNIX. Apple has paid for certification and trademark usage, so is UNIX, Linux and many flavors of BSD have not.

    72. Re:Persistent myth? by Hatta · · Score: 1

      253 comments and rising. Compared to the less trollish articles immediately surrounding this one on the front page, the strawman has done his job.

      --
      Give me Classic Slashdot or give me death!
    73. Re:Persistent myth? by Beelzebud · · Score: 1

      OSX is based on BSD, so if BSD isn't Unix (by your definition) then how the hell does OSX qualify?

    74. Re:Persistent myth? by element-o.p. · · Score: 2

      I've noticed a few article lately about how 'real men' login as root at all times, but I've worked in Unix/Linux since the 90's, and this seems to be a recent phenomena.

      Yeah, I've seen that, too. I cut my sys admin teeth in a shop where we used sudo extensively. After four years, I did not have the root password to any of the *Nix servers we had (nor did I want them), but I did have "sudo all" permissions. After I left that job, I came to my present environment where the senior admin didn't want to bother setting up sudoers (to be fair, there were only two of us in the sys admin role, so if he didn't run a command as root, he knew who did...), and the fact that I sign in as root on our servers *still* makes me cringe.

      IMHO, and perhaps veering slightly off-topic, "real men" are secure enough in their own virility that they don't have to resort to acts of reckless bravado to prove how "manly" they are <shrug>

      --
      MCSE? No, sir...I don't do Windows. Yes, I am an idealist. What's your point?
    75. Re:Persistent myth? by richlv · · Score: 1

      "It's a persistent myth that only the beating of tom-toms restores the sun after an eclipse. But is that really true?"

      why would you want to beat tomtoms ? :/

      on a more serious note, that part about "common myth" is just stupid. no unix-like system admin i've met has advocated for that. quite the contrary, all of them have argued heavily against it and for finding the root cause of the problem.

      the only ones who have screamed "reboot !" were windows people who were trying to figure out this linux thing.

      an important thing to note is that i don't see such a claim (about persistent myth) in the article - seems to be an artistic (read, crap) submission. also, splitting that short article on two pages makes me want to kick them hard when accessing it over gprs.

      --
      Rich
    76. Re:Persistent myth? by Thyrsus · · Score: 2

      Unix is a trademark owned by the The Open Group, and you may use that trademark to describe your system if you pay money to have them run their tests to verify compliance with the Single Unix Specification. I believe Red Hat has done that in the past, and that particular version of Linux was thus bona fide Unix(R), but it seems Red Hat has not chosen to continue certifying their systems. Someone please correct me if I'm wrong.

      I believe Red Hat sent back upstream all the changes they needed to make to pass the test; I presume many others also worked on conformance to the standard. Sometimes those behaviors aren't there unless the POSIXLY_CORRECT environment variable is set.

      Thus, while not "legally" Unix, Linux normally does realize all the concepts and behaviors of real Unix.

    77. Re:Persistent myth? by element-o.p. · · Score: 1

      Okay, I did once have a PC where the battery was actually embedded in a plastic brick that had to be soldered to the motherboard, but that was one of the original Pentiums, and of all the computers I have worked on, it was the only one I ever saw with a battery like that. So, on the likely assumption that your server is not a freak like that PC I had...wouldn't it be better,safer and easier to just replace the battery than hope your server doesn't go off the air at an inopportune time?!?!

      --
      MCSE? No, sir...I don't do Windows. Yes, I am an idealist. What's your point?
    78. Re:Persistent myth? by gman003 · · Score: 1

      The sudo thing doesn't seem that bad, unless the year of "Linux on the desktop" has finally arrived. For servers, pretty much everyone who has any access should have full access - anybody who should be logging into a database server should be able to do essentially anything that's necessary. Obviously, this depends on not giving access to people who don't need it (for instance, developers shouldn't have access to the production servers). And when I speak of "everyone", I mean "every person", not "every account" - system accounts used by daemons and such shouldn't have root access anyways.

      Sudo is not a perfect security solution, nor is it designed to be. If a user really wants to run a command as root, he'll find a way. Sudo rules will not stop an admin gone postal. Sudo is intended to make root logins unnecessary, stop uneducated users from doing things they ought not be doing, and to limit (or at least slow) the damage an intruder can do. That's it.

      My personal BSD box has some very simple sudo rules: my personal account and root are permitted to do everything (with the user's password required every half-hour). Everyone else is not allowed to use sudo. And I would be fine with using such a sudo configuration in a real production environment.

    79. Re:Persistent myth? by Anonymous Coward · · Score: 0

      "It's a persistent myth that calling free() after malloc() is unnecessary, but some software engineers disagree..."

      Calling free() after malloc() is unnecessary in any case where the only remaining thing the program does is exit. This is because the OS will be reclaiming those pages at program exit anyway, and it doesn't so much matter what is on them. This should only be taken advantage of if

      1) you know you're running on a reasonable OS,
      2) all the memory allocated is needed at the same time (so you're not shrinking the total footprint), and
      3) you won't be needing to track down memory leaks.

    80. Re:Persistent myth? by peragrin · · Score: 1

      depends on the applications your using and what system calls they make.

      I have seen it both ways myself. as I said it is a small problem. because of numbers windows computers it is noticeable. I don't think it is even easy to recreate either. There are ATC servers at LAX that need weekly reboots. Yet a home user playing games may or may not be fine.

      the exact combination seems unknown. a bug that is difficult to reproduce in closed software will never get fixed. As each vendor will blame each other.

      --
      i thought once I was found, but it was only a dream.
    81. Re:Persistent myth? by isorox · · Score: 1

      I've noticed a few article lately about how 'real men' login as root at all times

      No, they don't. They only do that when they need it, and have configured their systems so they rarely need it.

      Indeed, because they have sudo.

      Two options
      1) Let everyone with a business need to access the box have the root password for cases when they really do need it. In reality, they su to root as soon as they log in.
      2) Let everyone with a business need to access the box have unfettered sudo access. This way you can see what mayhem they've caused in the log files.

      In my line of work, we've got better things to do than defend boxes against incompetent engineers. If you've got a logon to the box, you should be trustworthy enough to know what you're doing.

    82. Re:Persistent myth? by VortexCortex · · Score: 1

      For the record, in the embedded world it's pretty common to see malloc() used without free(). The use case is providing pointers with memory blocks during program initialization.

      Hell, since the OS reclaims all allocated memory when a program exits, calling malloc() without free() for "singleton" instances and other program-lifespan objects is common.

      For the record, I think All programs & libs should at least have a few #ifdef DEBUG sections in which the free() is called simply to make debugging memory leaks easier...

      (... GTK+ is in "common" use, and doesn't free() everything it malloc()s. This is one of the reasons I hate debugging GTK+ apps. You need the "exceptions" file for your version of the lib just to give Valgrind a go because someone decided that a few milliseconds of free() at the program close is just a waste. I could see this reasoning for rapid execution of terminal apps, but for GUI frameworks?)

    83. Re:Persistent myth? by hicham · · Score: 1

      I'm not familiar with Unix itself enough to comment, but with both Linux and *BSD you're able to start and restart services without a reboot and the architecture is such that you're much less likely to end up in a situation where you can't perform whatever action you need to in order to clear the error manually. I'm not sure how I would even go about looking up how to do a lot of that with a Windows box.

      And windows does NOT have services.msc where you can start, stop and restart services? Like, apache or mysql, for example... Just sayin....

    84. Re:Persistent myth? by element-o.p. · · Score: 1

      +1

      I had a similar problem with FreeBSD at a previous job. While still new to the OS, I tried to rsync changes to the /etc/passwd and /etc/group files to multiple servers (they were a farm of mail servers running ClamAV, spamassassin and some custom scripts as a front end to an ISP mail server). After syncing the directories from the server where I actually made the changes, I didn't understand why the changes weren't taking on the other servers, and consequently ended up ssh'ing to each server in the farm and duplicating the commands on each one. I later discovered that unlike Linux, FreeBSD uses the /etc/passwd, /etc/group and /etc/shadow files as the source for a Berkeley DB, rather than reading the plain-text files for account information...sigh.

      --
      MCSE? No, sir...I don't do Windows. Yes, I am an idealist. What's your point?
    85. Re:Persistent myth? by element-o.p. · · Score: 1

      Heretic! :)

      --
      MCSE? No, sir...I don't do Windows. Yes, I am an idealist. What's your point?
    86. Re:Persistent myth? by Torodung · · Score: 1

      It is a "persistent myth" that the words "persistent myth" appear in TFA.

      Perhaps it was redacted.

    87. Re:Persistent myth? by Creepy · · Score: 1

      Kind of wrong - BSD is not UNIX and not technically even UNIX-like because the Open Group defines UNIX as the trademark name that is UNIX, not the OS itself. It is kind of like calling every soda pop type beverage a Coke (and there are places that do!), even though they have nothing to do with Coke. BSD (and Linux) is technically an OS that operates similarly to OSes that carry the UNIX trademark (that's about the best I can do).

    88. Re:Persistent myth? by kc8jhs · · Score: 1

      OS X is mostly Openstep which is mostly Nextstep which is mostly BSD 4.3 and Mach 2.5.

      I don't know what you consider 'Unix' but I would say the best description of OS X is that is closer to BSD than anything else.

      Just for fun sometime turn off graphical boot on an OS X machine, and watch the Regents of the University of California message scroll by ;)

    89. Re:Persistent myth? by kmdrtako · · Score: 1

      Arguing that even the Microsofties can't get it right is not a particularly compelling argument.

      (Next rebuttal will be "....that's your opinion...")

      Yes, and on the topic of reading--- If you actually read what I wrote--- I 'said' C/POSIX APIs (and even the old DOS Int21 'system calls') – and my example was of a Standard C library function.

      PathMatchSpec() is neither a C or POSIX API.

    90. Re:Persistent myth? by TheRealGrogan · · Score: 1

      The best impression goes to the guy who says "Windoze and Linux boxen" because he's a real person, who probably has some real experience.

      But that's just me. I'm not impressed by the use of IT buzzwords. "I worked on all the boxes" is better than "I was a visionary who was entrusted with integration, management, monitoring and analysis of the first and second tier systems" (or some such inflated language for a technician)

    91. Re:Persistent myth? by Jesus_666 · · Score: 1
      --
      USE HOT GRITS WITH STATUE OF NATALIE PORTMAN (NAKED AND PETRIFIED)
    92. Re:Persistent myth? by element-o.p. · · Score: 1

      In fact, none of the *nix sysadmins I know would dream of rebooting the box to clear a problem except as a last resort.

      It's rare that I reboot a *Nix box to clear a problem, but I have had a problem with some Linux-based routers that I use where SNMP hangs and can't be killed until the box is rebooted. Since we use SNMP to keep tabs on the routers (they are providing access to our anchor-tenant customer, located about 500 miles away from my desk), if SNMP quits working, it's typically worth a reboot to fix...although I'll wait until off-hours to do so.

      I haven't personally been bit by rebooting a Linux box and making everything worse... But I've seen enough other people get bit, and I've read enough horror stories on-line.

      I have, and again, it was on those same Linux-based routers*. There have been a couple of times when sending a reboot command to a router that had a sufficiently hosed snmpd that the box failed to shut down. Unfortunately, it tends to fail at a point where SSH has already ended, so I lose access to the router. Fortunately, it tends to fail at a point before ospfd and forwarding has shut down, so while I can neither monitor nor manage the router, it hasn't quit passing customer traffic. So as far as getting bitten goes, it's not terribly bad, but it is annoying.

      *In fairness, these routers have proven to be extremely reliable, powerful and flexible. Our customer tends to have far more problems with the Cisco routers they connect to our Linux-based routers than we have with our routers. On 70 or so deployed routers, I have an snmpd problem maybe once every two or three months. Perhaps twice a year, I'll actually have to reboot a router to clear snmpd, and I think I've had three or four of those reboots require me to dispatch someone to physically power off the router (in five years).

      --
      MCSE? No, sir...I don't do Windows. Yes, I am an idealist. What's your point?
    93. Re:Persistent myth? by Anonymous Coward · · Score: 1

      What part of "BSD" is hazy in your thoughts?

    94. Re:Persistent myth? by c6gunner · · Score: 1

      Yes, I know there are ways to fix each individual issue without actually restarting the system, but there are enough persistent problems that

      I have the option of doing a reboot - which takes about a minute and a half - or screwing around for half an hour trying to fix each individual problem. Likewise, the other guy could probably figure out a way to make his server run more smoothly without rebooting every night, but the reboot is the easier solution.

      Anyway, until I get hybernation to start working again, I don't have much motivation to look into a better solution since I end up having to restart the laptop at least once a week anyway.

    95. Re:Persistent myth? by Creepy · · Score: 1

      Tell that to Apple - every time I patch my (ancient - its OS is approaching end-of-life) mac, it tells me I have to reboot.

      And though Linux is not UNIX, same thing with Ubuntu. The only reason I have it at all (in a VM, mind you, I prefer just about any other distribution because Ubuntu just seems to make developer work hell) is to support a friend because she is relatively clueless in any flavor of Linux, but I like to encourage the noobs to branch out from their Microsoft overlords.

      My pure Linux box does get rebooted for more than just kernel patches, but usually because I use the patching mechanism and not manually patch like I did on my old gentoo box (and I could have a full time job patching that gentoo box, which is one reason I got rid of it).

    96. Re:Persistent myth? by zoips · · Score: 2

      FWIW, I've never seen a garbage collector that actually worked 100%. We'd be better off writing good clean code instead of relying on them.

      Garbage collectors for languages like C/C++ are conservative because they can't always tell what's a pointer to an object or not, so these will not work 100% of the time. In langauges where that isn't an issue (such as Java, Python, Ruby, whatever), the only time a garbage collector will fail is if it struggles with cyclical structures. Java's garbage collector works 100% of the time: if an object is not reachable, it will be reclaimed. Don't confuse this, however, with not having memory leaks; shitty developers who forget they've stashed a live reference to an object have only themselves to blame when the garbage collector does not rightly collect their object.

    97. Re:Persistent myth? by Anonymous Coward · · Score: 0

      I am a "windoze" admin who recently has been put in charge of several linux boxen. A wide variety from ubuntu to fedora. Of course I have "tinkered" with linux for sometime, but never in a serious administrative role.

      As someone who is new, and before I get too far into things to really screw them up, are there any books/guides you would recommend to learn about linux best practices for system administration?

    98. Re:Persistent myth? by Straterra · · Score: 1

      Depending on the language, you may be required to do this. I know I need to do this when doing Windows + C++/Qt development.

    99. Re:Persistent myth? by nabsltd · · Score: 1

      I normally work as an unprivileged user and perform the few required root actions via sudo. There's no point in restricting sudo commands to anyone who already knows the root password, but I believe it is a good habit to work as a user, not as root.

      I tend to either "su - root" or login directly as root (if I'm on the console), because there's nothing I do as a user on my servers...they are there to run services, and users generally don't log in directly. So, when I need to log in, it's to fix something, install an update, etc. The only advantage sudo has for this sort of administation is an audit trail.

      If I'm troubleshooting an actual user login, I ususally have to either try to log in as them or su to them to see what they are experiencing, since my non-root user almost certainly doesn't have the same groups, etc., as they do.

      For giving users "root-like" privileges (like changing to another non-root user without knowing that user's password), sudo is a great tool.

    100. Re:Persistent myth? by allo · · Score: 0

      yeah, with sudo you just do "sudo -s". now you have a nice rootshell, without any further logging and without the need to redecide if sudo is needed on every command.
      no big advantage to su. but its easier to revoce access of one user without telling all users a new password.

    101. Re:Persistent myth? by mini+me · · Score: 1

      Unix is something you pay to be. Apple has paid, BSD and Linux have not.

    102. Re:Persistent myth? by Straterra · · Score: 1

      I, unfortunately, still have to admin AIX 4 and 5 servers. No RIP for SMIT in my life.. :/

    103. Re:Persistent myth? by mini+me · · Score: 1

      You have to pay the OpenGroup to be considered Unix. OS X qualifies because Apple has ponied up the cash. That is why they call BSD and Linux Unix-like operating systems.

    104. Re:Persistent myth? by Anonymous Coward · · Score: 0

      I could have sworn that Mac OS X was a Mach kernel. And since when has any BSD based system not been considered Unix?
      Linux would be as much a unix variant as OS X...

    105. Re:Persistent myth? by C0vardeAn0nim0 · · Score: 1

      To say Linux and BSD are not UNIX is like saying humans are not primates...

      Something about religious beliefs or some shit, don't ask me.

      following the analogy, all OSes are primates. but they're different kinds of primates.

      saying "BSD and linux are not unix" could be construed as "capuchin and spider monkeys are not apes".

      or something like that.

      where's BadAnalogyGuy when we need him ???

      --
      What ? Me, worry ?
    106. Re:Persistent myth? by Anonymous Coward · · Score: 0

      "He is quite correct in his assertion that Linux and BSD are not Unix."
      It is simply not true, BSD is a real UNIX, it was one of the two major UNIX distributions (the other being SCO System V).
      The whole "UNIX-like" is a fucking trademark issue which may be of interest to lawyers and other pests but not nerds.

    107. Re:Persistent myth? by gnapster · · Score: 1

      As for "ALL(ALL) ALL" entries in sudoers, Ubuntu, I hate you for ruining an entire generation of linux users by aping Windows privacy escalations by abusing sudo. Learn to use groups, setfattr and setuid/setgid properly, leave admin commands to administrators, and you won't need sudo.

      This surely would not make a difference in cases where there is only one user account on the machine. I imagine that this is the state of a significant portion of Ubuntu installs, if not the majority. I don't have access to an Ubuntu box right now, but Linux Mint Debian Edition's default configuration has the following lines:

      Defaults env_reset
      root ALL = (ALL:ALL) ALL
      %sudo ALL = (ALL:ALL) ALL

      Only the user created during installation is placed in group sudo by default. So long as I am the only user who does any administration of the machine, I cannot see how this is a problem. That said, it is a good point that providing a relatively permissive, generic configuration does little to help a budding administrator learn the flexibility of sudo's configuration.

      find /home/* -user 0 -print

      If this returns ANY files, you've almost certainly abused sudo and run root commands in the context of a user - a serious security blunder in itself.

      Does /home/lost+found count? ;c)

    108. Re:Persistent myth? by randallman · · Score: 1

      accidentally restarted it instead of logged out (yes, it has that extra popup)

      Am I in the minority to think that using a GUI as the primary server interface is retarded? I know there' s Monad (or Power Shell) now, but it seems like nobody uses it except in arguments to say "Windows can do that too". On a Unix or Linux system, there would be no way to confuse logging out with shutting down. And you would need get root to shut the server down. I'm not kidding or trolling; It is sad that so many people find administration via RDP to a server as acceptable and that they can't see that "ssh server command" is sooo much more efficient.

    109. Re:Persistent myth? by John+Hasler · · Score: 2, Informative

      ...only Apple has had theirs certified by the Open Group, which makes it not just Unix but Unix(tm).

      No. That makes it Unix(tm) but not Unix. With a hacked Mach kernel, a modified BSD userland, and a totally custom GUI it is considerably less like Unix than is Linux. BSD, on the other hand, is a direct descendant of Seventh Edition Unix. The fact that Open Group was willing to sell Apple a trademark license shows just how worthless that trademark is.

      --
      Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
    110. Re:Persistent myth? by gnapster · · Score: 1

      Actually, everything I see in those pictures looks very rectangular and box-like. :c)

    111. Re:Persistent myth? by SWPadnos · · Score: 2

      Um. This has nothing to do with the path separator and everything to do with the C language.

      In C, the character '\' is the escape character. That's how you can print newlines ('\n'), tabs ('\t'), and other things. SInce the backslash has a special meaning sometimes, you have to escape it with a backslash if you want one in your string.

      To get the string literal "C:\command.com" in your program, you have to declare it as "C:\\command.com" in the C source.

      --
      - The Sigless Wonder
    112. Re:Persistent myth? by TheQuantumShift · · Score: 2

      So I took 2 minutes and actually read the article. The point was that Unix is not Windows and reboots are not a fix-all. No straw man, just common sense advice for those MCSE's out there. It's also good advice for Windows, but after a few attempts to discover root cause only to find out that the MF'n Event Log is "corrupt and cannot be read", I don't blame people for just rebooting/reinstalling. Hell, it's what MS says to do; which just goes to show they don't even know how their black box works...

      --

      Shift happens. Fire it up.
    113. Re:Persistent myth? by Waffle+Iron · · Score: 1

      Ok, how about _mkdir()?

      ...
        if( _mkdir( "\\testtmp" ) == 0 )
        {
            printf( "Directory '\\testtmp' was successfully created\n" );
            system( "dir \\testtmp" );
      ...

      (Next rebuttal will be "Microsoft stuck an underscore in front of the name, so it doesn't count!!!)

      Face it, Microsofties like to use "\\" in their path separators more often than not. Their documentation does *not* discourage this practice in any significant way. They have little reason to deter this practice because it's yet another subtle way to encourage developers to break compatibility with non-MS operating systems, which in turn increases the costs of migrating away from Microsoft's "ecosystem".

    114. Re:Persistent myth? by slimjim8094 · · Score: 1

      Why would I have any system configuration files in my user directory? Using sudo for a user config file is a waste of time, not a security breach.

      I think you're railing against a red herring.

      --
      I have developed a truly marvelous proof of this comment, which this signature is too narrow to contain.
    115. Re:Persistent myth? by Anonymous Coward · · Score: 0

      ... what are you talking about? You know forward slashes don't need to be escaped, right?

    116. Re:Persistent myth? by David+Gerard · · Score: 1

      Every time our Unix boxes reboot we discover something essential we had running that we forgot to put in /etc/init.d/ .

      --
      http://rocknerd.co.uk
    117. Re:Persistent myth? by C0vardeAn0nim0 · · Score: 1

      what you escribed was called "validation boot" at the old EDS, but it's not done to _solve problems_, which is the point of TFA, it's done to avoid _future_ problems, and only done during schedulled maintenance, which doesn't count against the SLA.

      --
      What ? Me, worry ?
    118. Re:Persistent myth? by TheHedonismBot · · Score: 3, Informative

      Maybe. I see what you are saying, but as a counter-example, I sometimes run tcpdump from within my home directory when troubleshooting problems. tcpdump has to run as superuser, and I have a lot more faith in giving myself and other admins permission to run "sudo tcpdump" than running tcpdump setuid 0. Again, maybe I'm just missing something, but I really don't have a huge problem with tcpdump (or other admin tools) writing UID 0 data to an admin user's home directory.

      You don't have to be root to use tcpdump. On ubuntu, do this:

      sudo aptitude install libcap2-bin
      sudo setcap cap_net_raw,cap_net_admin=eip `which tcpdump`

      If you run: getcap `which tcpdump` and it shows: /usr/sbin/tcpdump = cap_net_admin,cap_net_raw+eip then you're good to go. Now try running tcpdump as a regular user.

    119. Re:Persistent myth? by gnapster · · Score: 1

      I think maybe the problem described is more akin to a memory leak than faulty hardware: AutoCAD apparently crashes, but some memory isn't freed, and maybe it won't start again unless that memory is free again. That sounds to me like a problem with Autodesk's software, not Windows, but perhaps Windows' memory management allows it to happen when Linux does not. Or maybe it was something else. Who can say? All I know is that I used to have an XP box that would lose track of its optical drive after running for a couple of days. Sometimes in the middle of playing a CD.

    120. Re:Persistent myth? by Heretic2 · · Score: 1

      He is quite correct in his assertion that Linux and BSD are not Unix. [...] However, Mac OS X is Unix.

      How the fuck did you arrive at that conclusion given that OSX is based on BSD?

    121. Re:Persistent myth? by mlts · · Score: 1

      Even worse:

      ALL: NOPASSWD:ALL

      The only place I expect to see this is the sudoers file of my iPhone, because I have an obnoxiously long (>64 characters) root and mobile password on my device.

    122. Re:Persistent myth? by dcray2000 · · Score: 1

      I think this is universal to all admins. We have to regularly clean up both our windows and *nix systems due to bad administration. It's not the tech, it's the unexperienced associate running it.

    123. Re:Persistent myth? by David+Gerard · · Score: 1

      "That sounds like horrible software."

      Welcome to Proprietary-Land! We have a piece of software we absolutely rely on that is only certified for Java 5. That was end-of-lifed in late 2009. We're asking them about certifiying for a version of Java that isn't DEAD AND ROTTED.

      Vendors. Can't rip their hearts out with a rusty meathook ... 'cos they haven't got any.

      --
      http://rocknerd.co.uk
    124. Re:Persistent myth? by Anonymous Coward · · Score: 0

      Thank you. I'm not a *nix greybeard, just a guy who has been messing with home and work computers for 33 years now, and I'd never heard of this either.

      Seriously Taco, wtf? Have _you_ ever heard this? Save stuff like this for April first. Make that day a compilation of the year's facepalm submissions.

    125. Re:Persistent myth? by Jim+Efaw · · Score: 1

      "It's a persistent myth that only the beating of tom-toms restores the sun after an eclipse. But is that really true?"

      Odd: that's pretty much the intro line to well over a third of all programming on History Channel in the U.S. now. (Another third is historic battles recreated as computer animations with some guy talking about equipment like it was a football game; the rest is people selling crap someone had in their basement, which is about as close to actual history as they get now.) Watch for a revealing look (except not) at the life of Unix admins next season: The Admin's Book of Secrets.

    126. Re:Persistent myth? by shutdown+-p+now · · Score: 1

      The fact that in recent years some Windows commands accept a forward slash (in place of a backslash) in certain limited contexts is hardly a consolation for this nonsense.

      All Windows commmands accept slashes in place of backslashes in path and interpret it correctly, since DOS 2.0 (when DOS first got directory paths). The only trick is to get the shell (i.e. command.com / cmd.exe) to not interpret it as a switch, since it will treat slash as an argument separator. Simply quote paths with slashes (i.e. 'dir "/foo"'), and you'll be fine.

    127. Re:Persistent myth? by Anonymous Coward · · Score: 0

      Technically Mac OS X is a variant of OpenBSD. Most Unix obsessives won't acknowledge that experience with any *nix derivative gives any insight into Unix.

    128. Re:Persistent myth? by Halo- · · Score: 1

      ... what are you talking about? You know forward slashes don't need to be escaped, right?

      grr.... of course there don't. However, as a long-term Unix-only developer, my fingers pretty much type forward slashes for paths, even when trying to write an example of backslashed paths.

      But, if you replace my forward slashes with backslashes, my above example holds. :)

    129. Re:Persistent myth? by mlts · · Score: 1

      In the days where there was one sysadmin running a herd of machines, just logging in as root was A-OK, so logging in and su-ing to root would slow things down.

      Times have changed though. Accountability logs, even ones that are gotten around by using sudo -i or sudo -s are a must have in a lot of corporate environments to please the auditors. Client or regulatory requirements might require separate accounts for admins, so a box that only has just a root user is not allowed in a lot of places; it has to have the user's admin ID and the root user at the minimum.

      Personally, I feel better using sudo. It may not be "manly", but it does minimize the amount of time a "#" prompt is on the screen, which in the overall picture of both footshooting and security, is a good thing. Having a proper audit trail is also a good thing when it comes to CYA ability.

    130. Re:Persistent myth? by hierofalcon · · Score: 1

      A good reference to look through is "Guide to the Secure Configuration of Red Hat Enterprise Linux 5" issued by the Operating Systems Division Unix Team of the Systems and Network Analysis Center, National Security Agency.

      It goes over most of the daemons - how to lock them down - which ones to disable completely.

      Although oriented toward Red Hat, its concepts can be extended to most Linux and many Unix environments.

      Beyond that, most concepts related to best practices tend to be universal. If you're a good system administrator in Windows, you'll be a good system administrator in Linux once you learn the equivalents. Set up test environments whenever possible. Make sure patches work in the test environment before rolling out to production. Many distributions have a much higher patch frequency than Windows and also patch more parts of the system frequently. Fedora tends to be one of the most cutting edge distributions out there and changes massively without necessarily guaranteeing that a particular patch won't cause you some level of grief. CentOS distributions will follow the patch frequency of RHEL, just a bit delayed. These, and other equivalents, try very hard not to change APIs in between major releases. With other distributions, very little is left untouched between releases. Test on non-critical systems first and thoroughly. Back up securely and test your backups.

    131. Re:Persistent myth? by Anonymous Coward · · Score: 0

      I should add, the server is 2000km away. ;) That makes it more difficult to replace the battery. I don't want to bother with "remote hands" replacing a battery for no great reason.

      Rebooting *is* technically possible - BIOS only resets if it is a cold boot. But there is really no reason to do it. The server runs everything in containers these days. Services are reasonably simple. It is ECC ram of course, so there is no memory corruption as with non-ECC machines.

      In the server's entire lifespan (close to 7 years now!), it only failed to come up 2 or 3 times. These were
        1. motherboard died
        2. configuration error - my bad - needed someone remotely to fix a stupid /etc/fstab error. This is also why rebooting should never be a "hasty" job
        3. ISP's UPS fried resulting in the battery problem popping up

      In its entire time, it only was rebooted a handful of times. It runs a custom kernel. But there is another problem with the current kernel it runs - it can fail to reboot ;) I don't mean *start*, I mean *shut down*. There is some race condition in the kernel that only shows up when the server is being shut down. It doesn't oops, but it causes a lockup. So, that is another reason not to issue "reboot" ;)

      Did I also say that it is near the end of its lifespan?

    132. Re:Persistent myth? by Blakey+Rat · · Score: 1

      Windoze admins who are now in charge of linux boxen.

      You can tell how much experience he has because he types like a 13-year-old script kiddy.

      - root logins everywhere
      - passwords stored in the clear in ldap (WTF??)
      - require https over http to devices, yet still have telnet access enabled.
      - set up sudo ... to allow everyone to do everything
      - iptables rulesets that allow all outbound from all systems. Allow ICMP everywhere, etc.

      And... which of those is the OS' fault?

    133. Re:Persistent myth? by mini+me · · Score: 1

      You have to pay the OpenGroup to be Unix. BSD is only Unix-like. It's a silly point, but I was being fair to the parent since he obviously recognizes the distinction but wasn't clear on what made Unix, Unix.

    134. Re:Persistent myth? by billcopc · · Score: 1

      Amen! Security is a juggle between risk, convenience, and accountability. There are those who always favor the path of least risk (and greatest inconvenience), and there are others who take a more progressive approach.

      I'm the kind of guy who logs in as root, usually with a private key. Anything less will limit my ability to fondle the boxes remotely via ssh/scp/rsync. On top of that, I often whitelist a handful of IPs for SSH, maybe with port knocking as a fallback if I'm traveling. If anyone modifies that whitelist, I get an alert on my phone. This, to me, is good enough. If a trusted user does something stupid that brings the box down, they get a roundhouse to the ear, and a bill at the emergency rate for my time to repair it. If a malicious user somehow gets through all the defenses, well in the worst case I can take down the interface and repair the damage via an OOB KVM.

      The way I see it, if that's not "good enough", the client is more than welcome to pay more for the extra time it takes me to perform common maintenance tasks, or pay more for redundant hardware and a multi-stage deployment/backup scheme. Call me crazy, but I favor the hardware route. Hardware is cheap, my time is not. I'd much rather spend 15 minutes reloading a known-good backup image from the day before, than 4 hours reinstalling an OS and scouring all the client files for trojans.

      --
      -Billco, Fnarg.com
    135. Re:Persistent myth? by mlts · · Score: 1

      I solve that by stating "UNIX flavor", "UNIX variant", or "UNIX-like OS" when splitting hairs. This essentially covers the gamut of anything like this. If I know it is a true UNIX (TM), I will state "UNIX". However, only a relative few operating systems have the Open Group trademark (OS X, AIX, and Solaris are the main ones.)

    136. Re:Persistent myth? by malakai · · Score: 1

      I must be missing something obvious here. But from reading those source examples, the double backslash is just to escape the the single backslash. All documentation has it this way unless your use your programming language of choice to define the string as a literal, e.g (C#))

      string literal = @"This\is\a\literal\string";

    137. Re:Persistent myth? by roman_mir · · Score: 2

      I find it hard to believe that someone posting on Slashdot has not at least spent some time evaluating OS X, even if they ultimately decided it was not for them.

      - hmmm. I've worked with computers since about 91 and professionally since 95 and I only really touched the old apple machines a few times. So yeah, there are people here who didn't evaluate OSX (and not intending to)

    138. Re:Persistent myth? by Anonymous Coward · · Score: 1

      Really...? The cost to evaluate OS X is non-zero, non? So unless one knows someone else with a Mac, and that someone is willing to let you poke and prod their box (ooohhh... dirty minds...), how is one supposed to evaluate OS X? With suitable, if not ideal, alternatives, there's not tons of reason to go out of one's way to do this.

    139. Re:Persistent myth? by Anonymous Coward · · Score: 0

      > Unix is something you pay to be.

      I have to *PAY* to become a WHAT!? Um, I think I'm in the wrong line.

    140. Re:Persistent myth? by FormOfActionBanana · · Score: 1

      He's not talking about MS Windows systems. He's describing the effects of poor Unix administration skills.

      --
      Take off every 'sig' !!
    141. Re:Persistent myth? by similar_name · · Score: 1

      All Windows commmands accept slashes in place of backslashes in path and interpret it correctly, since DOS 2.0

      Interesting. I just tried dir "/nvidia" and it worked. Then I tried md "/test" and it failed saying the syntax was not correct. md "\test" worked and rd "/test" worked. So I'm thinking somethings work like that and some still don't.

    142. Re:Persistent myth? by juasko · · Score: 1

      Please look at the *nix/Unix trea at wikipedia.org.

      You'll see that Linux stand alone from any other, the others have inherited each other here and there from UNIX. Linux hasn't

      Well that said, if anyone made the certification, some version of linux could probably be certified as UNIX. MacOSX 10.0-10.4 are to be considered *nix. While MacOSX 10.5-10.6 are to be considered UNIX.

    143. Re:Persistent myth? by Anonymous Coward · · Score: 0

      hmmm. I've worked with computers since about 91 and professionally since 95 and I only really touched the old apple machines a few times. So yeah, there are people here who didn't evaluate OSX (and not intending to)

      You lost me at 'not intending to'. Admittedly, those were your last few words, but if you're that disinterested in OSX I'm gonna go out on a limb and guess you're equally not interested in Linux. And that smells a lot like long-term career-death disastersville in the making. My uncle 'retired' when his mainframe jobs evaporated.

      It NEVER hurts to dabble in other stuff that is out there. Powershell, C#, OS-X automation, mysql vs. MS SQL vs. Oracle vs. CouchDB, whatever. I don't mean bootcamp, and I agree you can't explore it ALL, but proudly wearing blinders serves no good end.

    144. Re:Persistent myth? by theCoder · · Score: 2

      At work I have a Windows box that, I kid you not, can only run about 25000 processes between reboots. It doesn't seem to be able to reuse process IDs, and once it gets to about PID 100000 or so (Windows PIDs are always multiples of 4), it just can't reliably spawn new processes. Including the process to shutdown and reboot the computer (equivalent of `shutdown'). Windows seems to generate PIDs somewhat randomly, so sometimes creating a process is able to find a good PID and it works, but other times it can't find a PID so it fails.

      Now, a normal person running Word and Excel probably wouldn't notice this. But I wanted to use this computer to build software using make. Well, make creates a lot of processes, especially with lots of subdirectories and sub-makes. Suffice it to say, it doesn't even make it through 'make clean' before running out of PIDs.

      I never did figure out what caused this issue. Probably, a bad combination of kernel level drivers. One post online thought a similar problem might be related to the video card driver, but I'm inclined to think that the anti virus (Norton) is at least partly responsible. I was able to duplicated this on other similar machines, but not every machine.

      So, anyway, maybe your software spawns a lot of processes and is running out of PIDs after a day or so.

      --
      "Save the whales, feed the hungry, free the mallocs" -- author unknown
    145. Re:Persistent myth? by shutdown+-p+now · · Score: 1

      Yup, I missed one more thing - leading slash is generally treated by DOS/Windows console programs as the start of a switch, same as "-" on Unix (note that this is a separate issue from cmd.exe splitting arguments on "/" - that one causes "dir /foo/bar" to be interpreted as "dir /foo /bar"; how "/foo" and "/bar" are interpreted by the command is then up to it).

      So this fails for the same reason why "mkdir -foo" would fail on Unix to create a directory named "-foo". But in Unix you'd just do "./-foo", whereas here we are trying to get an absolute path. If you are not averse to drive letters you can just use one - i.e. mkdir "c:/foo" - thus avoiding any backslashes. But the entire point of this exercise was to get a path that is valid in both Win and Unix, so this doesn't really solve the problem. I can't really think of any other workaround, unfortunately.

    146. Re:Persistent myth? by Xtifr · · Score: 2

      I believe Red Hat has done that in the past

      A company called Lasermoon once got their flavor certified. I don't believe that Red Hat ever did so.

      At the time, the biggest issue was a feature called STREAMS (all-caps), which Linus refused to include in the kernel, arguing that it was unnecessary for a system that came with source. Caldera (now SCO) acquired Lasermoon and included STREAMS in some of their versions of Linux, and was lobbying to have it included as a standard feature despite Linus's objections, but I don't believe that any flavor of STREAMS ever appeared in RH.

      According to Wikipedia, STREAMS are now an optional feature in the latest Single Unix Spec (SUS), so a system like Linux (or BSD) that lacks STREAMS can now be certified as Unix(tm), but that was not always true, so for a long time, Linux did not "realize all the concepts and behaviors of real Unix," and the only reason it can now is because the definition was deliberately changed to allow it to be included.

      As an ironic side note, the requirement for STREAMS seems to have been dropped at just about the same time (circa 2003) that Caldera morphed into a mad dog and began attacking the rest of the Linux community.

    147. Re:Persistent myth? by Anonymous Coward · · Score: 0

      "It's a persistent myth that only the beating of tom-toms restores the sun after an eclipse. But is that really true?"

      You mean I shouldn't have smashed my GPS during the last eclipse?

    148. Re:Persistent myth? by Anonymous Coward · · Score: 0

      You're a moron. The string only requires a single backslash, but the compiler requires the double slash to get it into the string because it is the escape character. The ONLY time you don't need to do with in C is the path in a #include because it is used verbatim by the preprocessor and is not considered a string literal. Fuckwit.

    149. Re:Persistent myth? by Anonymous Coward · · Score: 0

      But that was the original point - you can type paths with forward slashes in Windows code too. If you do use backslashes, you need to escape them, or you're not creating a string with backslashes (i.e. the same problem you're talking about with displaying the non-escaped string would occur inside fopen).

      He's not saying Windows will let you away without escaping backslashes, he's saying people should use forward slashes instead of the ugly escaping required.

      ["It's been 1 hour, 25 minutes since you last successfully posted a comment" - holy balls, Slashdot, just how long am I supposed to wait?]

    150. Re:Persistent myth? by smellotron · · Score: 1

      IMHO, and perhaps veering slightly off-topic, "real men" are secure enough in their own virility that they don't have to resort to acts of reckless bravado to prove how "manly" they are <shrug>

      That may be true, but "Real Men use white iBooks" has less flair than "Real Men reboot UNIX servers all the time for no good reason and then kick the Windows admins with their Energy Legs."

    151. Re:Persistent myth? by Anonymous Coward · · Score: 0

      Minute and a half? Get out of here.

      1. Close Firefox. Choose "remember tabs." Restart.

      2. I know that Ubuntu *used to* have the key combination of Ctrl+Alt+Backspace (Ctrl+Shift+Backspace?) that would restart X, but they disabled it. It only takes 5 minutes to reenable, though.

      Logging back on should eliminate the 30 seconds-1 minute rebooting. Normally I don't see what the big deal about a single minute is, but in this case there really doesn't seem to be a reason to reboot. Unless there's something that I don't know about restarting X, which is likely.

    152. Re:Persistent myth? by man_of_mr_e · · Score: 1

      The author makes a pretty glaring fault in his logic.

      1) Unix admins don't need to reboot to make sure the system still boots, forgetting to configure things correctly is a rookie move which good admins won't need to do.

      2) Rebooting is bad because some junior admin might have screwed up your /boot or /etc directory

      Uhhh... What? Great logic there, Gomer.

    153. Re:Persistent myth? by smellotron · · Score: 1

      I must be missing something obvious here.

      Yes, you are. The point is that Microsoft's environment does allow forward-slash "/" as a path separator, which is POSIX. However, precedence in the MSDN documentation is to continue using backslash "\\" (or r"\" in python or @"\" in C#). So sometimes people end up jumping through imaginary hoops:

      #ifdef _WIN32
      printf("%s\\.foorc", HOME);
      #else
      printf("%s/.foorc", HOME);
      #endif

      I'd also like to point out that cmd.exe will only tab-complete paths if backslashes are used, further emphasizing that MS has no interest in encouraging developers to switch to forward-slashes.

    154. Re:Persistent myth? by element-o.p. · · Score: 1

      Point taken. Next question: do you want a regular user to run tcpdump? Granted, most networks are switched, so the potential security hole is rather small in a typical business scenario, but if you happen to be on a network segment connected with a hub (or on a shared server, for example), it is something to consider. You'd be surprised what I've seen customers broadcasting in the clear over the Internet feeds my employer provides to them...

      --
      MCSE? No, sir...I don't do Windows. Yes, I am an idealist. What's your point?
    155. Re:Persistent myth? by linuxpyro · · Score: 1

      Well, Linux is functionally very much like UNIX, even if it hasn't inherited code. (Unless BSD code has made its way into Linux? I'm guessing not.) I would imagine that if someone organizing a GNU/Linux distro had the cash to get it certified, they'd probably rather pay developers or hosting or whatever instead.

      --
      Saying "I'll probably get modded down for this" in a post is the best way to get it modded up.
    156. Re:Persistent myth? by arth1 · · Score: 1

      Indeed, because they have sudo.

      No, because they have setfattr, chmod and su, and know how to use them.

      sudo is almost always used wrong. Either on a single-user system, where the user is also the administrator, in which case it's no problem for the user to also have the root password, or on multi-user systems to avoid having to learn how to set access control.

      The legitimate use for sudo, when a command is to be used both as the superuser and not as a superuser, and a wrapper is impractical, is hardly ever seen.

    157. Re:Persistent myth? by arth1 · · Score: 1

      Ugh...setuid and setgid, IMHO, should be used as little as possible. If there's a security hole in your app, then having it setuid/setgid allows a sufficiently skilled user the ability to gain elevated privileges. I'd much prefer to use sudoers to give access to specific apps to people I trust than give any user access to an app I "trust" through setuid/setgid.

      That's why you use setfattr on the setgid executables to control who can run them, and groups (or, indeed setfatrr) to control who can execute setuid executables.

    158. Re:Persistent myth? by Anonymous Coward · · Score: 0

      /home/lost+found

    159. Re:Persistent myth? by arth1 · · Score: 2

      Point taken. Next question: do you want a regular user to run tcpdump?

      chmod o-x /usr/sbin/tcpdump
      chgrp adm /usr/sbin/tcpdump
      chattr +i /usr/sbin/tcpdump

      Now only members of the adm group can run tcpdump, and no one can make a hardlink to it either.

      Or, you can allow individual users:
      setfacl -m u:someone:rx /usr/sbin/tcpdump

    160. Re:Persistent myth? by arth1 · · Score: 1

      The problem is that you have config files in your user directory with the same name as config files for the root user. When you run an app with sudo, and it writes a config file, it will write it relative to $HOME, which is your directory, instead of to root's directory.
      Which causes the problem that the user can't overwrite the files later, because they're owned by root.

      So yes, the problem is very real. If I had a dime for every time I have encountered it, I'd be set for coffee for a week.

    161. Re:Persistent myth? by TheHedonismBot · · Score: 2

      Point taken. Next question: do you want a regular user to run tcpdump?

      Create user "tcpdumper" in group "tcpdumper".
      chgrp tcpdumper `which tcpdump`
      chmod 754 `which tcpdump`

      Ensure tcpdump works only for root and a member of the 'tcpdumper' group.

    162. Re:Persistent myth? by arth1 · · Score: 1

      This surely would not make a difference in cases where there is only one user account on the machine. I imagine that this is the state of a significant portion of Ubuntu installs, if not the majority.

      In which case there surely is no problem with that user having the root password either, and doing "su -" when needing superuser access, no?

      At least that way, you don't have a problem with sudo commands writing to your $HOME directory, causing you to have to use sudo the next time too, to avoid errors because the user can't write to a root owned file...

    163. Re:Persistent myth? by HeronBlademaster · · Score: 1

      I think the reason Ubuntu instructs the user to reboot is that relatively few Ubuntu users would understand that they don't need to reboot, they "just" need to manually restart services A, B, and C, kill Gnome, and then log in again... That sort of thing is far beyond the capabilities of your average computer users.

      Besides, who really wants to look through dozens of updates to figure out the affected services? I know I'd rather just reboot. It takes far less time than reading all the update notes and then trying to remember whether any services that depend on those updated packages (but were not themselves updated) need to be restarted as well.

    164. Re:Persistent myth? by arth1 · · Score: 1

      It's blasphemy to alter the system's config with anything besides "vi".

      Oh, I don't know. I probably use echo and perl as much as vi (nvi).

      echo "foo" >>filename
      perl -pi.bak -e 's/x/y/' filename

    165. Re:Persistent myth? by Tacvek · · Score: 1

      I would most certainly consider rack mounted devices as boxes. Most 1U and 2U devices would be "pizza boxes", but 3U or larger rack mount devices are just plain boxes.

      --
      Stylish sheet to fix many problems in Slashdot's D3: https://gist.github.com/801524
    166. Re:Persistent myth? by Anonymous Coward · · Score: 0

      Ugh...setuid and setgid, IMHO, should be used as little as possible. If there's a security hole in your app, then having it setuid/setgid allows a sufficiently skilled user the ability to gain elevated privileges. I'd much prefer to use sudoers to give access to specific apps to people I trust than give any user access to an app I "trust" through setuid/setgid.

      Because obviously any setuid binaries are world-executable -- we couldn't just have a group for the people you trust and only grant them execute permission?

    167. Re:Persistent myth? by multipartmixed · · Score: 1

      > yet still have telnet access enabled.

      There is nothing wrong with having telnet enabled: the problem arises when you *use* it.

      Telnet access is a great way to get in if something goes horribly, horribly wrong, sshd is hung, and you need to fix the box *NOW*.

      Unfortunately, *using* telnet means you have to essentially treat the box as if it was rooted, once you have the sshd running again -- although telnetting in from another box on the same switch mitigates a lot of risk.

      --

      Do daemons dream of electric sleep()?
    168. Re:Persistent myth? by Joe+U · · Score: 1

      Well, I imagine people don't make posts on Slashdot pretending they're competing for a job that doesn't exist. Some of us enjoy the fact we're not forced into use proper management-speak on a site like this, and even may use use slang terms.

      You are right, I should have used a car analogy thrown in with Linsux, Crapple and the M$, because it makes me look smart.

      BTW if I was the interviewer, I'd go with the guy that actually KNEW the most and seemed to be a good fit for the team, not the guy that uses pretty words. People that feel the need to correct other people's grammer would probably NOT fit the team.

      You would hire someone, in a company, to support your critical network infrastructure, who talks like an idiot, because you think they know more? The term 'well rounded' comes to mind.

    169. Re:Persistent myth? by Anonymous Coward · · Score: 0

      And... which of those is the OS' fault?

      None of them. All are the admins' faults, which is what GP was trying to say. Very good - you've learned to read words on the page. Try to understand their meanings now.

    170. Re:Persistent myth? by Anonymous Coward · · Score: 0

      Technically Mac OS X is a variant of FreeBSD. Most Unix obsessives won't acknowledge that experience with any *nix derivative gives any insight into Unix.

      FTFY

    171. Re:Persistent myth? by JumpDrive · · Score: 1

      I think you could spend a lot of years working on Linux and not have a clue what to do on a Unix system and vice versa.
      BSD not so much.
      But the difference between Linux, BSD and Unix is much smaller than wtth Windows and either of these operating systems.
      I have to deal with Windows Admins daily who think the only way to fix a computer is to reboot it. yeah, just keep rebooting that DNS server will just start up. but like with any server, in some cases rebooting is the last thing you want to do.
      I did admin an exchange server a long while back an inherited the advice if it goes down reboot it first. About once a month I'd reboot it. Heck we had people in the office who would just go in and reboot the system just because they hadn't gotten an email that day.
      When we were first starting out we didn't have a proper server room, one night a VP decided we were wasting a bunch of electricity and he decided to turn off all the computers that weren't being used. Shut down all file shares, backup systems , mail and DNS, authentication, the whole damn lot. I'm sure he was walking out to the parking lot thinking he was saving us mega bucks that night.

    172. Re:Persistent myth? by BlueBlade · · Score: 3, Insightful

      - iptables rulesets that allow all outbound from all systems. Allow ICMP everywhere, etc.

      As a network admin, I have violent fantasies of driving hot nails through the privates of the "Let's block all ICMP by default" admins whenever I come up at a new client's site to troubleshoot some complex networking issues. If you block ICMP echo, you better have an extremely good reason for it. If it's from a public WAN link facing the internet, then *maybe* you might have a case (but most often not). If it's on a web server or other public-facing services, you PROBABLY DON'T HAVE A VALID REASON. If you block traceroutes from anywhere except edge firewalls, you are a clueless idiot. And even then, requests coming from inside interfaces should be let through. THIS IS ESPECIALLY TRUE OVER MPLS AND Site-to-Site VPN LINKS!

      Whew, that felt good. Seriously, blocking icmp doesn't do *anything* for security. If you are getting flooded by icmp packets, just configure a flood threshold. These days, any icmp DoS flood that is bad enough to actually interrupt services very likely doesn't need the extra "reply" traffic to work. And if your clever "security" of not replying to pings on anything that has ports open is stupid, as a simple port scan will reveal the host.

      Please, for the sake of every network admin's sanity, leave ICMP alone. Thank you.

      --
      Religion is the best example of mass psychosis
    173. Re:Persistent myth? by Anonymous Coward · · Score: 0

      He's never had a team of admins working in different countries at different times who step on each other and need to sort out who did what...sudo makes this trivial since you know who, ran what, trivially instead of a dozen root users all stepping on each other's history. Oh, and never had to admin hundreds of boxes and used the history to crutch their fallible memory for the obscure service restart command which will exist in their own history but will be quickly squashed in root's. Or used expect to automate admin tasks using sudo without having to allow remote root logins (of any kind, ssh, whatever, we were paranoid), keep their shell-script-in-a-single-line commands you used that one time, or a couple of dozen other reasons sudo is terribly useful for real servers (where real servers have more than one admin, or one room of admins).

      andy

    174. Re:Persistent myth? by bzipitidoo · · Score: 1

      If you are a afraid to reboot your server when its working fine because you don't know it will come back up, then you ALREADY HAVE A PROBLEM.

      Hear! Hear! What's with this hangup with rebooting? Do it, or not, as appropriate. I don't reboot out of wishful hoping that problems will be magically fixed, but I reboot quite often for other reasons. Such as, the company changed data centers and the machines have to be physically relocated, and before the big move, I want to be sure they will come back up. Kernel updates are pretty routine, but less diligence there can save a lot of unnecessary work, and possibly avoid trouble. If it's a high availability production machine, then I will spend extra time if by doing so I can avoid having to reboot. Even then, ought to have failovers, backups, and so forth, so that one machine at a time can be taken down for maintenance.

      Some nasty problems I've encountered were 2 machines that had been configured with the same static IP address, and a drive partition that had been formatted with XFS with unusual parameters. No apparent problems, until a reboot. Then the load balancer grabbed the other machine with the same IP, and wouldn't listen to the first machine any more. The machine with the XFS partition couldn't mount it after a reboot-- parameters in fstab weren't correct. A routine problem is the kernel update in Arch Linux that breaks XWindows because the proprietary graphics driver has to be redone. (I just reinstall the kernel to fix that one. And reboot a 2nd time.) There are all kinds of ways UNIX machines can be ticking time bombs if you never check that booting up still works. Forgot to add sshd to the startup routines? Now you have to go to the data center. Rather like the cute trick of using fdisk to delete all the partitions on a Windows box while it is running. All is well, until the reboot....

      If it isn't high availability, isn't running a database that takes 20 minutes to shut down and start up, then what's the big deal with doing a reboot? Usually the fastest way to switch to new versions of libraries, XWindows, etc. I'm not going to go in there and manually kill and restart dozens of processes, or close hundreds of terminal sessions that went zombie when the users disconnected without logging out first (assuming there isn't a timeout in place), when 1 reboot lets me move on to something else much more quickly.

      --
      Intellectual Property is a monopolistic, selfish, and defective concept. It is "tyranny over the mind of man"
    175. Re:Persistent myth? by Anonymous Coward · · Score: 0

      whats wrong with ICMP. I swear every admin is paranoid about it.

    176. Re:Persistent myth? by Anonymous Coward · · Score: 0

      i can't hear you. you said boxen.

    177. Re:Persistent myth? by Anonymous Coward · · Score: 0

      You've described the perfect situation for directory ACLs/extended attributes (depending on the version of *nix you have) not sudo.

    178. Re:Persistent myth? by rtfa-troll · · Score: 1

      The one who manages to solve my programming problems. If he can do that then I don't care if he speaks like a pirate.

      --
      =~ s,(.*),<sarcasm>$1</sarcasm>,g if any_point_you_wish();
    179. Re:Persistent myth? by roman_mir · · Score: 1

      you are trolling and unsuccessfully. Am I interested in Linux? I do not write the kernel code. I use Fedora 11, Debian and Open BSD on the servers for business and I switched most of the office to Ubuntu machines as well.

      I am not interested in OSX and any Apple products.

    180. Re:Persistent myth? by lakeland · · Score: 1

      No, he was being pedantic but is correct UNIX is a trademarked term and neither BSD or Linux are licenced to used that term, while OSX does have such a licence. At a technical level, of course you are correct.

    181. Re:Persistent myth? by Compaqt · · Score: 1

      Did you catch the discussion on ./ a few months back re: (I think it was) Fedora? They (or some group associated with them) are advocating against sudo as a security risk. I.e., it allows people to break into a normal user account and become root.

      The strange thing is, you've got some practices that are touted as "best practice" while at the same time being castigated as a security risk. So what are normal people (as normal as geeks can be) to do?

      --
      I'm not a lawyer, but I play one on the Internet. Blog
    182. Re:Persistent myth? by juasko · · Score: 1

      jepp, and that is why it is a *nix not UNIX. As we'll never know if it actually qualifies. But hey so what, live with it it's close enough.

    183. Re:Persistent myth? by MichaelSmith · · Score: 1

      And OpenCola is a Coca-Cola like beverage.

    184. Re:Persistent myth? by Anonymous Coward · · Score: 0

      I saw a "Windows guy" take over admin of a Unix system for a financial institution, processing teller transactions and real money. For the most minor glitch, he'd want to reboot. We used to call him "Reboot Eric".

      Maybe this was something that they taught in college on non-production machines. If something goes wrong, reboot. If that doesn't fix the problem, reload the operating system from external media. Coming from an environment where we had years of consecutive up-time on numerous machines, I thought it was insane! He would reboot several times a week.

      "Reboot Eric" went on to host a local computer radio show at the University radio station (a "dot commentary" type show), and now is CEO of his own company. I wonder if he remembers his indiscretions of youth.

    185. Re:Persistent myth? by AmiMoJo · · Score: 1

      There is the right way to admin a server and there is what you do when every time there is a problem your phone starts ringing with people screaming at you because they can't work and why haven't you fixed the server yet and how long will it take and what about their important meeting at 2pm etc.

      Eventually you get fed up with people flushing shit down your pipe and keep a root ssh/rdp session open for the next inevitable turd coming your way. The life of an admin is thankless, people only come to you with problems and go nuts because not being able to access that email or print that file makes them look bad. You could do it properly but no-one will care.

      --
      const int one = 65536; (Silvermoon, Texture.cs)
      SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
    186. Re:Persistent myth? by AmiMoJo · · Score: 1

      In my experience most Windows admins actually hate rebooting their servers and will do everything they can to avoid it. The simple reason is fear of them not coming back up again.

      Windows is particularly troublesome because if it doesn't boot to the login screen you can't access it remotely (no early telnet or SSH, or even RS232 console) and because Windows Update likes to do stuff at reboot time. This combination is double bad because you can't see the update install progress remotely, only sit there running ping and praying. Buggy Windows Updates and third party apps seem to hose the system more often than most Unix ones do too.

      Server 2008 R2 is getting better but doing an ISS update still seems to need a reboot where as doing an Apache update usually doesn't.

      --
      const int one = 65536; (Silvermoon, Texture.cs)
      SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
    187. Re:Persistent myth? by mikechant · · Score: 1

      I think the reason Ubuntu instructs the user to reboot is that relatively few Ubuntu users would understand that they don't need to reboot, they "just" need to manually restart services A, B, and C, kill Gnome, and then log in again...

      If you look at the detailed output from (say) synaptic or whatever tool you use to do the updates, you can see that quite frequently the updates are packaged to automatically stop and restart affected services, including the network, and a reboot is avoided.
      I assume that the line is drawn at anything which needs the gui restarting, since in those cases it's going to involve any open gui applications being closed/reopened and in that case it's more user-friendly for this to be done at a time of the user's choosing and simpler to just set a 'restart required' flag.

    188. Re:Persistent myth? by mikechant · · Score: 1

      I know that Ubuntu *used to* have the key combination of Ctrl+Alt+Backspace (Ctrl+Shift+Backspace?) that would restart X, but they disabled it. It only takes 5 minutes to reenable, though.

      5 minutes? more like 30 seconds.

      Anyhow, what's wrong with the new default key combination AltGr+PrtScr+K?

    189. Re:Persistent myth? by gravis777 · · Score: 1

      :-p

    190. Re:Persistent myth? by narooze · · Score: 1

      Sudo allows pretty fine-grained access to users based upon group or user name, so you can easily allocate permissions as required (well, relatively easily, anyway) -- much more fine-grained than Unix User/Group/Other permissions would allow. For example, with sudo you could allow senior admins (group: admin) and web developers (group: www-dev) read/write permissions to CGI script directories, junior admins (group: jadmin) read-only permissions and all other users (group: users) no access. Uh-oh...we've got four groups here: admins, jadmins, www-dev and users, so doing that with standard Unix permissions is going to be kind of difficult

      That's what you have (POSIX) ACLs for.

    191. Re:Persistent myth? by Maximum+Prophet · · Score: 1

      Cool, Sys V Unix had a similar problem. New pids were created monotonically and stored in as 16 bit values (or 15). When the pids on a system rolled, things started going wonky.

      An old 3B20 wasn't fast enough to roll the pids for any useful processes before the machine crashed or rebooted, but when the port of Sys V Unix to AIX was done, the new machines were much faster and pid allocation was rearranged so that they would be reused much quicker. That was 20 years ago, so I don't remember the code, but it was an easy fix.

      --
      All ideas^H^H^H^H^Hprocesses in this post are Patent Pending. (as well as the process of patenting all postings)
    192. Re:Persistent myth? by drinkypoo · · Score: 1

      That's why you use setfattr on the setgid executables to control who can run them, and groups (or, indeed setfatrr) to control who can execute setuid executables.

      As long as it's easy to wipe out those extended attributes then there's no sense in using them.

      When we get some GUI support for actually using that kind of stuff, and the tools preserve them properly across the board (or close enough) then it will make sense. Until then there's sudo. Or selinux, but who is actually using it properly?

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    193. Re:Persistent myth? by drinkypoo · · Score: 1

      -H

              The -H (HOME) option sets the HOME environment variable to the homedir of the target user (root by default) as specified in passwd(5). The default handling of the HOME environment variable depends on sudoers(5) settings. By default, sudo will set HOME if env_reset or always_set_home are set, or if set_home is set and the -s option is specified on the command line.

      always_set_home

              If enabled, sudo will set the HOME environment variable to the home directory of the target user (which is root unless the -u option is used). This effectively means that the -H option is always implied. Note that HOME is already set when the the env_reset option is enabled, so always_set_home is only effective for configurations where either env_reset is disabled or HOME is present in the env_keep list. This flag is off by default.

      hope this helps

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    194. Re:Persistent myth? by drinkypoo · · Score: 1

      2. If the graphics driver has leaks, restart the X server.

      It's not really leaks but state. The driver can get the card into a state from which it cannot retrieve it. I had this problem more with ATI but I have it plenty with nVidia, and on both Windows and Linux. The Linux nVidia driver in particular used to love to put cards into a state where the screen was black but sync was being provided. Now I mostly have problems I can fix by killing and restarting compiz.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    195. Re:Persistent myth? by Anonymous Coward · · Score: 0

      Never hire Windows admins to run *nux servers.
      Do not hire *nix admins who are not at least certified in the flavor you are running. If they do not years of experience on top of that, make sure they are junior admins and keep them contained to simpler things until you can determine their true knowledge.
      If you are too small of a shop to do this, then keep a proper consultant on retainer/support contract to have them handle the system and do not fuck with it.

      This is solid advice for running any server

    196. Re:Persistent myth? by vux984 · · Score: 1

      what you escribed was called "validation boot" at the old EDS

      Understood. But a lot of the issues people have with a "troubleshooting boot" are the same with a "validation boot". -- a lot of admins just don't want to EVER reboot. "if it ain't broke, don't fix it."

      And a lot of the reasons people have for not doing a "troubleshooting boot" is that they haven't done a validation boot in so long they have no confidence that it will work.

    197. Re:Persistent myth? by Elshar · · Score: 1

      I'm inclined to think that the anti virus (Norton) is at least partly responsible

      Well, there's your problem right there!

    198. Re:Persistent myth? by element-o.p. · · Score: 1

      The strange thing is, you've got some practices that are touted as "best practice" while at the same time being castigated as a security risk. So what are normal people (as normal as geeks can be) to do?

      The answer lies in risk management: if you want a perfectly secure computer, remove the hard drive, encase the computer in solid concrete and sink it in the middle of the Marianas Trench. That's probably about as secure as you can make a computer. Unfortunately, it's not terribly useful. So a "normal geek" should seek to find the best balance between "secure" and "usable".

      Maybe the group in the discussion you referenced (and unfortunately, I missed that one...) needs more security than I do in my environment, and therefore eliminating sudo is a good choice for them. I set up my personal servers to use SSH trusts with a pass phrase to log in and then to require sudo to run anything that needs root permissions. This accomplishes several things:

      1. Remote connections can only come from hosts that I have explicitly allowed (that's why I set it up in the first place -- I was tired of seeing all of the brute-force SSH username/password guessing games in my logs).
      2. To sign in, you have to know the SSH trust passphrase, so even if you have access to a "trusted" host, there is still an authentication mechanism required
      3. Once you have logged in to the server, you must know the password for the login account (*NOT* the same as the SSH trust passphrase) to use sudo. So even though you've authenticated, you have to authenticate again with a second password to do anything very useful.
      4. Finally, since you are using sudo, I can see what users have done in my log files, and I will get an e-mail if someone tries to run sudo with an incorrect password. Consequently, unless you already know both the SSH trust passphrase AND the login account password, I'll get an e-mail as soon as you try to run anything with sudo, reducing the time an attacker has to do anything nefarious.

      I'm sure it's not a perfect solution, but it seems to work reasonably well for me.

      --
      MCSE? No, sir...I don't do Windows. Yes, I am an idealist. What's your point?
    199. Re:Persistent myth? by element-o.p. · · Score: 1

      ...and there is what you do when every time there is a problem your phone starts ringing with people screaming at you because they can't work and why haven't you fixed the server yet and how long will it take and what about their important meeting at 2pm etc.

      I solved that problem by putting my phone on forward during an outage :) There are only two of us sys admins where I work, and there are two desktop support people. When the poo hits the fan, the desktop support guys run interference while we admins work on the problem. YMMV.

      --
      MCSE? No, sir...I don't do Windows. Yes, I am an idealist. What's your point?
    200. Re:Persistent myth? by ByOhTek · · Score: 1

      Every part of the server application I admin on Windows or Linux is proprietary.

      Wait, no, the Windows piece uses GhostScript which isn't proprietary.

      It's very stable. Proprietary doesn't make something junk - bad developers make it junk.

      --
      Self proclaimed typo king, and inventor of the bear destroying coffee table (patent not pending).
    201. Re:Persistent myth? by marcosdumay · · Score: 1

      And that is why Gnu is Not Unix.

    202. Re:Persistent myth? by marcosdumay · · Score: 1

      It is not rebooting first. Normaly people that manage Windows machines do scheduled reboots. Last time I was at IT, we did weekely reboot of our Windows servers, but the frequency vary (altought it is never so low as once a mount, for example - it could have be on older systems, but newer vesrions of Windows Server are less stable).

      If you don't do the scheduled reboots, weard things start to happen, and you have no guarantee that the timing will be good.

    203. Re:Persistent myth? by Shompol · · Score: 1

      ... Linux and BSD are not Unix...Mac OS X is Unix.

      Mac OS is based on NeXTSTEP, which is based on BSD. Unlike things under FSF license, BSD can be copied and used as a base for a proprietary system, which Steve did.

    204. Re:Persistent myth? by Crackez · · Score: 1

      In my line of work, we've got better things to do than defend boxes against incompetent engineers. If you've got a logon to the box, you should be trustworthy enough to know what you're doing.

      That never works. Engineers are not SysAdmins, and rarely are they trained well enough to mimic one.

      On shared machines, no one gets root (or sudo) unless they are managing that box and are responsible for it. So, everyone gets sudo on their desktop, and only the real Admins get sudo/root on the development servers. If someone can prove a real need to run something as root, and once we in IT can verify that it's not a security hole, we generally permit only one exact command line to be run via sudo. If they cannot distill their need to a very specific command (using absolute paths, including any arguments) then we reject it.

      This is not a BS policy; it was derived from real events. Despite what the engineers around here would like to believe, they aren't generally smart enough to manage a box by themselves. It's been tried, and we refuse to take over any machine they managed without a wipe/reinstall of the OS.

      "Oh but the whole build environment is setup on that machine, we'll have to do it over again!" Tough titty, learn to do it over again. If you were a good engineer or programmer, you should be able to find a way to make it not as hard next time. Whats that, the guy who set it up left the company? That sucks, guess you need to figure it out anyways in case your box breaks.

    205. Re:Persistent myth? by isorox · · Score: 1

      yeah, with sudo you just do "sudo -s". now you have a nice rootshell, without any further logging and without the need to redecide if sudo is needed on every command.
      no big advantage to su. but its easier to revoce access of one user without telling all users a new password.

      Sure, but then anyone with physical access to the machine can always log in. If you tell people to use sudo when they need to, the number of instances of this is reduced, and you can identify how something was broken.

      Sudo (and it's logging) is there to learn from accidental cockups. You already trust the people.

    206. Re:Persistent myth? by kashani · · Score: 1

      As a former network engineer I salute you. The rest of you bastards, pay attention.

      --
      - Why is the ninja... so deadly?
    207. Re:Persistent myth? by DaVince21 · · Score: 1

      Replacing a BIOS battery isn't exactly difficult, though. Unless the internals can't be reached?

      --
      I am not devoid of humor.
    208. Re:Persistent myth? by badkarmadayaccount · · Score: 1

      Application errors can fuck it up - better not give the network management console rights to the bosses $HOME.

      --
      I know tobacco is bad for you, so I smoke weed with crack.
    209. Re:Persistent myth? by badkarmadayaccount · · Score: 1

      You have root - set up your own sudo, you're a big boy now.

      --
      I know tobacco is bad for you, so I smoke weed with crack.
    210. Re:Persistent myth? by badkarmadayaccount · · Score: 1

      For the last time, XNU is not a BSD variant, no more than NT is by using the network stack... Yes, there is a larger part of the code that came from there, but the freaking HAL, and driver system have nothing to do with BSD.

      --
      I know tobacco is bad for you, so I smoke weed with crack.
    211. Re:Persistent myth? by Anonymous Coward · · Score: 0

      "It's a persistent myth that calling free() after malloc() is unnecessary, but some software engineers disagree..."

      I presume you mean to call free() some time after malloc() and not immediately afterwards??

      As an order of operations, calling free() before malloc() is pretty pointless!

    212. Re:Persistent myth? by ibbie · · Score: 1

      I had a test OSX server at my last job I was at. Pretty much closed all ports except for 80 and whatever port Apple RDP ran on (actually, I closed that down at first, oops, had take a macbook into the server room to fix that), then opened up ports as needed.

      I'm sure you realize this now, but for the sake of anyone else reading this you could have saved yourself a trip to the server room had you left 22 open; most OSX services can stopped and started via command line, just like any other *nix. I could be wrong (I don't use Mac's often), but I think they stick them in /System/Library/CoreServices, or somewhere similar.

      As far as not using sudo, I think it depends on what you're doing. For example, if you're just running a single command, it makes sense to sudo (e.g., editing /etc/hosts, restarting a service, etc), but if you're going to be working with root privileges for a while, it just makes more sense, and saves you some keystrokes, to use su.

      --
      The wise follow a damned path, for to know is to be forsaken.
    213. Re:Persistent myth? by CAIMLAS · · Score: 1

      I know that many vendors (used to?) who layered their shit on top of AIX and SCO (for instance) would recommend this. I've come across quite a few of said boxes which will reboot themselves after a nightly image + data backup script completes.

      Nightly is a big excessive, IMO, but I see nothing wrong with a scheduled weekly or monthly reboot. Nightly/daily, and you run the risk of too many power cycles on your disks. Weekly or monthly, that's not such a problem (even with 5+ years of service, that's not so many). By scheduling your reboots you can hedge your bet slightly and catch hardware failures in the act.

      --
      ~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
    214. Re:Persistent myth? by CAIMLAS · · Score: 1

      What's funny is that 'block ICMP' is the default (IIRC) on Windows firewalls. This is an agitation beyond belief, and it is therefore disabled promptly by anyone with competence.

      Quick question: what is the single most useful and universally used network diagnostic tool in your arsenal, regardless of operating system?

      ICMP ping. It is (or should be) the one fucking universal on your network (for a damn good reason). Without it, you're stuck trying to figure out which hosts are where (let's look at ARP!), or trying to figure out which other service you can query to determine if the host has died. (Sure, good documentation, consistency, and proper infrastructure will get you around this, but when do the pedantic tools who implement such Jurassic thinking bother with any of that fluff when it's all in their head? Who cares about the next guy who comes along...)

      (Don't even get me started about conditional access restrictions within the same VLAN/subnet, often spanning multiple subnets/VLANs. What. The. Fuck.)

      Who the fuck would block ICMP? Oh, that's right: someone who doesn't troubleshoot or fix things, just goes on with their precognitive concepts of 'security'. We use to give people like this jobs as night security guards, janitors, and what have you: they're pedantic and small minded, and get in the way of everyone else when allowed out of their cages.

      (That felt good, too.)

      --
      ~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
    215. Re:Persistent myth? by airdweller · · Score: 0

      I just tried that on my Ubuntu laptop and nothing was found. I sudo all the time on it. Any other ideas?

    216. Re:Persistent myth? by bolthole · · Score: 1

      Very secure. Good thing it isnt possible to get a copy of the "tcpdump" from somewhere else, and copy it onto that machine and run it, eh? Allowing users to download stuff from outside? well, good thing you turned that off first thing.

    217. Re:Persistent myth? by Anonymous Coward · · Score: 0

      You realize half the tools you're using are Linux-only, right? So your advice is useless to most real admins, because most real admins would never use Linux for security-sensitive work.

    218. Re:Persistent myth? by Anonymous Coward · · Score: 0

      Do you seriously have this problem? I don't use `su` because I only need to run one command, and `su -c` might as well be `sudo`. For someone being all uppity about permissions and owners and groups, why not, I don't know, change the permissions, owner, and group on these files you're so worried about? I mean, how hard is it to do `sudo chown` immediately after whatever command created some root-owned files (I've never, ever, ever had this happen to me)?

      Stop ranting about your ill-informed Linux security theater. Seriously advocating the use of extended attributes? *Seriously*? That alone should be enough evidence to ignore your advice.

    219. Re:Persistent myth? by rakaur · · Score: 1

      What are you even still talking about? What "writes a config file relative to $HOME"? I've never had anything do this. If you're running some editor for the first time ever, maybe. Even then I think most things are smart enough to look at `whoami` versus `$HOME` and figure it out. I cannot ever recall having root-owned dotfiles in my user homedirs. I just checked on the 30 or so boxes I admin, and not a single one has this issue.

      Of course, none of them run Linux.

  3. Uh.. no by Anrego · · Score: 5, Informative

    I for one believe in frequent-ish reboots.

    I agree it shouldn't be relied upon as a troubleshooting step (you need to know what broke, why, and why it won't happen again). That said, if you go years without rebooting a machine... there is a good chance that if you ever do (to replace hardware for instance) it won't come back up without issue. Verifying that the system still boots correctly is imo a good idea.

    Also, all that fancy high availability failover stuff... it's good to verify that it's still working as well.

    The "my servers been up 3 years" e-pene days are gone folks.

    1. Re:Uh.. no by Anonymous Coward · · Score: 2

      Disagree.

      Rebooting is bad. It booted the first time, Why would it not boot the second?

      If you don't have proper controls than you should not have anyone touching the box.

    2. Re:Uh.. no by JonySuede · · Score: 2

      . That said, if you go years without rebooting a machine... there is a good chance that if you ever do (to replace hardware for instance) it won't come back up without issue.

      we reboot our unix server once a month exactly for this reason, we have been bitten once so we learned this the hard way.

      --
      Jehovah be praised, Oracle was not selected
    3. Re:Uh.. no by hedwards · · Score: 1

      Well, that's the thing, with cloud computing you generally don't have to waste the resources on a server that's up all the time and enough to cover the full load, depending upon the service or set up it's definitely possible to set yourself up to have additional capacity come online as needed, and for the most part those other servers are pretty much identical.

    4. Re:Uh.. no by Stenchwarrior · · Score: 1

      I agree with you. I used to build it into the cron to reboot every Sunday at 11:00p. The medical practice management software that ran on there tended to build up temp files and not remove them automatically...this was a fault of the application. My startup script would remove them and keep the hard drive (a whopping 4GB) from filling up. Since the services that needed to run were appropriately added to the same script there was never an issue of them not starting which is one of the main reasons you wouldn't want to reboot.

      --
      Loading...
    5. Re:Uh.. no by DaMattster · · Score: 2

      I for one believe in frequent-ish reboots.

      I agree it shouldn't be relied upon as a troubleshooting step (you need to know what broke, why, and why it won't happen again). That said, if you go years without rebooting a machine... there is a good chance that if you ever do (to replace hardware for instance) it won't come back up without issue. Verifying that the system still boots correctly is imo a good idea.

      Also, all that fancy high availability failover stuff... it's good to verify that it's still working as well.

      The "my servers been up 3 years" e-pene days are gone folks.

      Well, you make a point but, shouldn't a server be replaced when it gets old enough anyway? Wouldn't it be nice to have a server up for 3 years of reliability? At this point, who really cares if a reboot would cause a failure? You have backups, plan to replace the aging hardware. It doesn't pay to be miserly with server hardware, especially because its quality has gone on a downward trend as demand for cheaper pricing goes up. And how does verifying a system boot really ensure the the server is working correctly? Too often, I have seen a server boot without problem but other latent problems arise - i.e. failing network cards and failing cooling fans.

    6. Re:Uh.. no by Gaygirlie · · Score: 2

      I do actually recommend to RTFA. He quite clearly says you shouldn't need to reboot the whole system unless you're patching kernel itself, more-or-less everything else can be just restarted or reloaded, including kernel modules, and he even backs up his argument against rash reboots with some valid logic. (Though it's something any system administrator worth anything should already know without a random person on teh internets telling him! Really, shame on you if you just reboot every time you see a problem.) He doesn't say to never reboot, either, even though the submission does make it sound like it.

    7. Re:Uh.. no by Anrego · · Score: 4, Insightful

      Maybe true if the box is set up then never touched. If anything new has been installed on it.. or updated.. I think it's a good idea to verify that it still boots while the change is still fresh in your head. Yes you have changelogs (or should), but all the time spent reading various documentation and experimenting on your proto box (if you have one) is long gone. There's lots of stuff you can install and start using, but could easily not come up properly on boot.

      And why are reboots bad. If downtime is that big a deal, you should have a redundant setup. If you have a redundant setup, rebooting should be no issue. I've seen a very common trend where people get some "out of the box" redundancy solution running... then check of "redundancy" on the "list of shit we need" and forget about it. Actually verifying from time to time that your system can handle the loss of a box without issue is important (in my view).

    8. Re:Uh.. no by jcoy42 · · Score: 2

      Well, that's your opinion.

      The boot up process starts a lot of extra electrical noise in the box by spinning up all the fans, HDs, probing things, etc. That's usually when something breaks. What I have seen is that boxes which get rebooted frequently tend to burn out faster. I have had 2 otherwise equivalent machines, purchased at the same time, one used for dev and one for production, and the dev machine burned out 2 years before we retired the production machine (burned out means too many fan/disk/CPU failures to bother with). The biggest difference? The dev machine was updated and rebooted far more frequently. The production machine we took care to only muck with when we had to, and when possible, we fixed it without a reboot.

      Now it could be that the frequent updates on the dev machine is what caused it to burn out faster (more random use), and sure, it could have been a fluke, but look at it this way- when does a light bulb burn out? When you turn it on or when it's left on?

      --
      Never trust an atom. They make up everything.
    9. Re:Uh.. no by GreyLurk · · Score: 2

      Why reboot? Why not just kill off the process, clear the temp files, and restart the process?

    10. Re:Uh.. no by dAzED1 · · Score: 1

      err...or, you could figure out why there was a problem. Rebooting a system removes a lot of forensic data, and you should know long before it's dying that there is a problem.
      There's nothing a reboot "fixes."

    11. Re:Uh.. no by Anrego · · Score: 1

      I recommend you actually read my post ;p

      I clearly said.. right there in the second paragraph.. that I agree with him on not using reboot as a troubleshooting mechanism.

    12. Re:Uh.. no by dch24 · · Score: 1

      Everything old is new again. "I can reboot an instance, it's cloud-based with HA!" That means you are not the target market for this article.

      Who do you think keeps your magical cloud running with five-9's of uptime? You can't seriously think the VM host will run better after a reboot. Who do you think manages the HA load balancer? (Hint: it is managed, just like everything else.) What if they had to reboot it?

      "I need to reboot every month/week/solar cycle because otherwise I have no disaster recovery!" I suppose if you really are worried you have a test deployment and a production deployment, and you're very careful to use tools to guarantee they stay perfectly synced. So... you reboot the test machine to test, right?

      "I can't afford a test machine, and I can't control the service configuration, so I can't guarantee it will boot up!" You're in a world of hurt. While you're at it, why don't you run the occasional rm -rf / (then hit Ctrl-C) just so you can enjoy the pleasure of a reboot?

    13. Re:Uh.. no by Yaur · · Score: 1

      Totally agree. If things fail you want it to happen when you have control of the situation, not whenever some retard decides to pull the wrong cable.

    14. Re:Uh.. no by OzPeter · · Score: 1

      Disagree.

      Rebooting is bad. It booted the first time, Why would it not boot the second?

      If you don't have proper controls than you should not have anyone touching the box.

      Even with controls you are assuming that anybody who touched the box between boots has performed their work flawlessly and/or the actions that they performed will do as expected. Yes you can replicate an environment and practice changing things and rebooting, but unless you have 100% replicated things then all you are testing is your assumption that the replication was complete. So it still comes down to an assumption that can only be tested by a physical reboot.

      --
      I am Slashdot. Are you Slashdot as well?
    15. Re:Uh.. no by digitalhermit · · Score: 1

      Interesting, but not true.

      "Frequent-ish" reboots can work in non-enterprise environments where you have downtime windows. In international organizations that run 24-7, this is rarely the case without lots of coordination. Now, if you design a system with high availability and redundancy, you can very well take down one node in a cluster for maintenance... Or if you virtualization you could migrate the VM to another host transparently. Alas, in many enterprises there are one-off systems that exist for a particular purpose and that have gone from a skunkworks project in a business unit to a semi-critical app.

      So you end up with an app sitting on a non-redundant physical machine that cannot get a 1 hour maintenance window without extensive planning. And by this I mean alerting Madrid, London, Dubai, Cancun, Alaska, etc.. and trying to schedule hundreds of users to deal with the outage.

      The argument is that if that system is so "mission critical" then it needs to have redundancy. Hah, welcome to corporate politics where unless a system is involved with revenue generation or payroll then that project essentially has no budget.

      For this reason I love that Unix systems can go years without a reboot. I'm one of three admins that manage close to 400 OS instance... close to 50 applications. Dozens of databases... Dozens of physical machines. Rebooting just to see if the system comes back up is a pipe dream.

    16. Re:Uh.. no by tbuskey · · Score: 1

      Reboots to fix problems should never be done.

      Reboots as a matter of policy isn't a bad idea.

      If your system reboots periodically, you force network disconnections, memory cleanup, etc.

      Users that logged on months ago are no longer tying up resources. Maybe they don't need it but forgot to logout. Or their client died so there's a zombie on the server.

    17. Re:Uh.. no by OzPeter · · Score: 3, Insightful

      (wishing that /. would allow edits)

      To add to my previous comment. The general consensus of disaster recovery best practice is that you do not test a backup strategy, you test a restore strategy. Rebooting a server is testing a system restore process.

      --
      I am Slashdot. Are you Slashdot as well?
    18. Re:Uh.. no by m509272 · · Score: 1

      Agreed. Servers should be rebooted periodically. Once every 3 months is a good number. Almost every time we've had a server up for a year or two there were problems bringing it back up up when it went down unexpectedly or for some sort of hardware maintenance. Of course, many of the people that were the sys admins had gone elsewhere and hours went by before they finally figured out some startup script was copied and altered just to get it to come up the last time. Better off scheduling a shutdown and restart when it's convenient.

    19. Re:Uh.. no by Stenchwarrior · · Score: 1

      We also had serial terminals attached through DigiBoard and Stallion Boards, both were notorious for flaking out unless rebooted regularly, as well. Maybe this was a unique situation but every *nix machine I built after I left the medical field received the same treatment.

      --
      Loading...
    20. Re:Uh.. no by RollingThunder · · Score: 1

      Agreed. A reboot isn't a panacea for troubleshooting, but they still should be performed. I view them as akin to drills in the military - they drill and practice so that flaws in the process can be identified early on.

    21. Re:Uh.. no by Anonymous Coward · · Score: 0

      ...and he even backs up his argument against rash reboots with some valid logic.

      Of course that logic runs down to "Unlike Windows, with Unix you can never tell if somebody hasn't screwed up vital files needed to boot the machine."

    22. Re:Uh.. no by Anrego · · Score: 1

      Oh man.. no word of a lie.. I actually _winced_ when I read DigiBoard!

      So.. much.. pain...

    23. Re:Uh.. no by Isaac-1 · · Score: 1

      BIOS battery

    24. Re:Uh.. no by Anonymous Coward · · Score: 1

      Because hardware may have gone bad, and it's better to know about it during scheduled downtime rather than in an emergency (like a power outage that lasts longer than your backup power which happened at the office) or after an accident (idiot unplugs your power cord instead of his/hers -- this actually happened to us at the server farm, and our case was locked to prevent this sort of shit.) Somehow when nothing can possibly go wrong, it does.

      We've had a hard drive spinning in a machine for the better part of a decade and found that it would not spin up again without a few power cycles.

      I know there are answers to all of these, like use a bigger UPS, replace hard drives more frequently, and find a better colocation service. That isn't to say I reboot on a schedule, I don't. My point is that sometimes a machine won't boot the second time and it might be at a time you do not want to wake up and go for a drive.

    25. Re:Uh.. no by Anrego · · Score: 1

      Good grief...

      I agree it shouldn't be relied upon as a troubleshooting step (you need to know what broke, why, and why it won't happen again).

      Second sentence man. Not even buried... it's the start of the main paragraph. Less knee-jerk please.

    26. Re:Uh.. no by Kjella · · Score: 2

      Well, you make a point but, shouldn't a server be replaced when it gets old enough anyway? Wouldn't it be nice to have a server up for 3 years of reliability? At this point, who really cares if a reboot would cause a failure? You have backups, plan to replace the aging hardware.

      You care because it's 2:30 in the morning, your manager is yelling at you because the all important end-of-quarter stuff is due in the morning, the server is full of one day's production data that isn't backed up yet and even though you have money in the budget you don't have a hot server with the exact same software/patch level/configuration ready to dump your backups into?

      Very few systems are so critical they can't have some planned downtime. Unplanned downtime on the other hand can be extremely costly, and the only thing that matters is fixing it ASAP, save no expense. You can afford new hardware, what you can't afford is the time to install/setup that hardware.

      --
      Live today, because you never know what tomorrow brings
    27. Re:Uh.. no by Stenchwarrior · · Score: 1

      Yeah, they were HORRIBLE. But this was before Netterms and FacetWin caught on. Even then, it took those tight-wad doctors years to make the change to TCP what with the wiring and hardware costs. I still get called from time to time from an old client that STILL has a Wyse hooked up and a printer through the pass-through port because it keeps losing the emulation. I answer the phone "hello again, 1997" whenever they call.

      --
      Loading...
    28. Re:Uh.. no by GooberToo · · Score: 1

      That' exactly right. Statistically, most drive failures occur during boot.

    29. Re:Uh.. no by dAzED1 · · Score: 1

      Did you try getting to my second sentence? "Rebooting a system removes a lot of forensic data, and you should know long before it's dying that there is a problem"

    30. Re:Uh.. no by dAzED1 · · Score: 1

      so boot off a SAN, or learn how to read SMART messages.

    31. Re:Uh.. no by noidentity · · Score: 1

      I agree it shouldn't be relied upon as a troubleshooting step (you need to know what broke, why, and why it won't happen again). That said, if you go years without rebooting a machine... there is a good chance that if you ever do (to replace hardware for instance) it won't come back up without issue. Verifying that the system still boots correctly is imo a good idea.

      This doesn't contradict his advice, as you're suggesting to reboot every few months when the machine is working. His advice is to not use a reboot as the first step in solving a problem. If anything, a periodic reboot when it's working is probably in-line with his advice, as it's a way to uncover more problems that may be lurking, at a time when things seem to be working (and hopefully when the downtime won't be a big issue, like during low load).

    32. Re:Uh.. no by Anonymous Coward · · Score: 1

      A company I used to work for had a catastrophic electrical failure. It's a company that plays a big role in getting money moved around, so these machines were doubly and trebly "protected" against such an event, but then such is the nature of catastrophes.

      Anyway, a lot of us that had a role in support were called in to ensure that "our" systems were brought up successfully, in the right order, without any issue, etc. As it came to the "Big Iron"'s turn, a great many systems just wouldn't start. Seems some of the hard drives on these mainframes hadn't been power cycled in ages; in some cases at least a decade. And once down, down they remained. Being a Big Production Shop, we had crazy levels of vendor support, and a great deal of the rest of the day saw the vendors coming in and out with new HDs, controllers, magic circuitry, blinkenlights, and all sorts of hardware to replace these things that Just Wouldn't Go. Fun times.

    33. Re:Uh.. no by dAzED1 · · Score: 1

      and yet, you also say you believe in rebooting frequently. You know, just to clean stuff up. Cause them old pipes get clogged up, and the tubes need to be purged.
      There's no reason to reboot something just because. You destroy forensic data, you destroy indexes and caching, etc. Fix what is broken, leave the rest alone. Your dev setup is where you should be working out any issues you think might cause the system to not come up.

    34. Re:Uh.. no by Idbar · · Score: 1

      Power outages occur, no matter how hard you try to get over them. A UPS fails, battery fails, something can happen.

      The worse of a power outage, it's not knowing what hanged and collapsed the whole computing room to cleanly come up to service again.

      I my self have applied patches that don't break running stuff, but some stuff won't boot right.

      As someone else said, it's not about one system it's a whole computing room restore what can take just one hour or 12 depending on the system that you didn't know just hanged and was extremely important for the other systems to come up clean.

    35. Re:Uh.. no by Antique+Geekmeister · · Score: 1

      I suggest once a year. It's a good opportunity to test high-availability components and failover procedures, and to re-arrange power layouts and replace UPS batteries. Too many clients get pretty fussy about even scheduled reboots, and communicating the downtime to everyone in a large environment can create political battles.

    36. Re:Uh.. no by Svartalf · · Score: 1

      Once you said, "Digiboard", out the window went the argument of trying to fix things once and for all instead of just simply rebooting...

      --
      I am not merely a "consumer" or a "taxpayer". I am a Citizen of the State of Texas
    37. Re:Uh.. no by GameboyRMH · · Score: 1

      +1 that's the only disadvantage of never rebooting, surprise hardware death. If you reboot often you're likely to run into hints that a hardware failure is coming - unreliable bootup, hard drive spinning up slowly/multiple attempts, SMART warnings if your OS doesn't handle that, etc.

      --
      "When information is power, privacy is freedom" - Jah-Wren Ryel
    38. Re:Uh.. no by tepples · · Score: 0

      He quite clearly says you shouldn't need to reboot the whole system unless you're patching kernel itself

      My Ubuntu box gets kernel or OpenSSL updates about as often as Windows gets Patch Tuesday. And for the most part, I don't reboot Windows or Ubuntu except A. after such updates or B. after I have shut down a laptop.* So yes, "patching kernel itself" is the excuse I most often cite.

      more-or-less everything else can be just restarted or reloaded, including kernel modules

      Unless of course the system has to be shut down to install the internal, non-hot-pluggable hardware that the kernel module talks to.

      * Yes, I use suspend on my laptop, but it still drains power over a course of days, and I'm too occupied with productive work to dick around getting hibernate working when booting is just as fast.

    39. Re:Uh.. no by Ultra64 · · Score: 1

      How would rebooting a second machine tell you if the first machine would come back up after a reboot?

    40. Re:Uh.. no by GooberToo · · Score: 1

      The problem with SMART is that many drive failures do not create SMART warnings. Furthermore, warnings are not necessaries indicative of pending drive failure. SMART, while better than nothing, is just barely so.

    41. Re:Uh.. no by Lord+Ender · · Score: 1

      Uptime shouldn't matter; availability should. Anyone who confuses the two is a noob sysadmin.

      --
      A slashdotter who didn't build his own computer is like a Jedi who didn't build his own lightsaber.
    42. Re:Uh.. no by Anonymous Coward · · Score: 0

      You forgot "Some sysadmins might disagree, " before your addition.

      Oh wait, different thread - this was an ACTUAL straw-man, not a fake one. Carry on.

    43. Re:Uh.. no by Anonymous Coward · · Score: 0

      'Rebooting a cloned version of a server on identical hardware is testing a system restore process.'

      Having a third machine as fail-over (choose your location carefully) to mitigate downtime is just prudent.

      Rebooting a server is testing nothing but how much downtime it takes to get fired from your job.

    44. Re:Uh.. no by sznupi · · Score: 1

      Curious to hear that ending part, few days after one (long powered on) light bulb here basically exploded. ;p

      --
      One that hath name thou can not otter
    45. Re:Uh.. no by Anonymous Coward · · Score: 1

      It is amazing the deer in the headlights look you get out of some people when you ask 'did you actually TRY it?'. That would be no.

      Even on 'redundant systems' people seem to be reluctant to try it out.

    46. Re:Uh.. no by sqldr · · Score: 1

      thanks to doing all that fancy configuration management stuff, our servers get package updates quite frequently. we're bang up to date. alas this means kernel upgrades occasionally. That requires a reboot. If your server has been up for 3 years then it's got a 3 year old kernel on it, probably with bugs.

      --
      I wrote my first program at the age of six, and I still can't work out how this website works.
    47. Re:Uh.. no by Anonymous Coward · · Score: 0

      There's nothing a reboot "fixes."

      ... an Oops?

    48. Re:Uh.. no by element-o.p. · · Score: 1

      Rebooting is bad. It booted the first time, Why would it not boot the second?

      Please tell me you aren't seriously asking that question?

      Here are a few examples:
      1) Because the RAID controller battery died, and you didn't know it because you never, ever reboot your box. Consequently, when your server dies while you are on vacation in Hawaii, you end up talking an entry-level desktop support monkey through your RAID setup via telephone;
      2) Because when your vendor SSH'd into the box, the moron deleted /var (or /boot) while troubleshooting (no joke, this actually happened to me once);
      3) Because hardware ages, and problems that may lie dormant can be exposed when the server is rebooted. I've seen Dell USFF desktops work fine until they were rebooted, but then fail to come back up because some caps on the motherboard were leaking. The desktop would have eventually failed, but the reboot revealed the problem earlier. Better to discover that your server is on its last legs during a maintenance window than in the middle of the production day.
      4) Again, because hardware ages, and problems that may lie dormant can be exposed when the server is rebooted. We had another server lose a hard drive (no RAID). All of its services were running from a RAM drive and writing to NFS mounts, so we had no idea the HDD had died until a power outage took it off-line, and the server didn't come back up afterwards.
      5) You do update your server from time to time, don't you? Do you know that all of your configs, etc., are still valid for the updated services, kernels, etc.? I once had to administer a VMWare server that required work every time we updated the Linux kernel for VMWare to start. If you updated the kernel, didn't reboot the server and run the proper tweaks, then subsequently had a power failure, the server would start, but none of the guest OS's would. Rebooting the server after updates proved that the guest OS's would be available after a power failure.

      YMMV, so do what works in your environment, but IMHO, periodic reboots are a good way of verifying that your server will come back on-line if something were to take it off the air at night, while you are on vacation, etc.

      --
      MCSE? No, sir...I don't do Windows. Yes, I am an idealist. What's your point?
    49. Re:Uh.. no by jrobot · · Score: 1

      Not just drives, but many electrical component failures are from inrush current at power on (similar to lightbulbs burning out when switched on)

    50. Re:Uh.. no by dch24 · · Score: 1

      Read my post again. Here, I'll bold it for you: "I suppose if you really are worried you have a test deployment and a production deployment, and you're very careful to use tools to guarantee they stay perfectly synced."

    51. Re:Uh.. no by nabsltd · · Score: 1

      It booted the first time, Why would it not boot the second?

      I recently had a long enough power outage that my Linux servers needed to be shut down (I don't have a generator at home).

      One of them came back up running really slow. It turns out one CPU fan had seized after stopping and cooling down and wouldn't restart, so the CPU went into thermal slowdown.

      So, to answer your question, any of millions of random things can cause the server not to boot correctly the second time.

    52. Re:Uh.. no by ukyoCE · · Score: 1

      I've seen servers fail to come up for a reboot because of hardware failures before. I'm not sure if I would recommend an annual reboot or what, but scheduling a reboot to test for that sort of issue lets you pick the time for a potential recovery effort. If an unexpected power or software failure causes the reboot, you could find yourself dealing with multiple issues at once, which can make troubleshooting and recovery much harder.

      I've also seen unix boxes start having issues after being up too long (again, multiple years) and having certain integers wrap. The kernel hacker at my company had seen the issue multiple times before and recognized it, I was just the on-site monkey at the time, so I don't recall more specific details.

    53. Re:Uh.. no by archen · · Score: 1

      The article summary is a bit inaccurate. It would be better described as you should never need to reboot a unix server to fix a problem. Personally I like to reboot machines about every 120 days or so - with FreeBSD boxes I just do it during version upgrades which is close enough. On boxes others mess with, it's a good idea just to make sure it comes up clean. It's come back to bite me in the ass before where there was a major problem (like a power loss), I wasn't there and when someone else at the company brought it up there were problems - sometimes compounded with many changes made incrementally over time. While already being in panic mode, they screw things up more and turn it into a clusterfuck.

      Just because you never plan on having a fire doesn't mean you shouldn't have a fire drill.

    54. Re:Uh.. no by mrrudge · · Score: 1

      Pssst. Your live box is probably getting more use than the test server, so it's hardware is likely to fail first. Your duplicate is just software, whether it boots or not is very little indication of whether the live one will boot also. I can

    55. Re:Uh.. no by ben_kelley · · Score: 1

      +1
      If you never tested it, how do you know it works? Although I guess you could take this too far.... OCD anyone?

    56. Re:Uh.. no by hjf · · Score: 1

      apt-get update && apt-get -y --force-yes upgrade && reboot

      OOPS! sources.list said "testing" instead of "squeeze", now your box is hosed, because debian moved releases and you didn't do a manual dist-upgrade

    57. Re:Uh.. no by NieXS · · Score: 1

      Users that logged on months ago are no longer tying up resources. Maybe they don't need it but forgot to logout. Or their client died so there's a zombie on the server.

      Obligatory xkcd

    58. Re:Uh.. no by msi · · Score: 1

      Very few systems are so critical they can't have some planned downtime.

      Nothing is so critical that it cannot have planned down time, the more critical a system is the more down time it requires for planned maintenance. If the service cannot go down it need to be clustered preferably on different sites with different power etc.

      This is why the USA has the 25th amendment.

      However, rebooting is not a troubleshooting procedure it is a maintenance procedure.

    59. Re:Uh.. no by Yunzil · · Score: 1

      Rebooting is bad. It booted the first time, Why would it not boot the second?

      They're so cute when they're young. :3

    60. Re:Uh.. no by sl149q · · Score: 1

      I've always suggested that Linux (any Unix) servers should be rebooted once a year just to make sure that they can reboot and get all their services restarted.. Pick a time when usage is down, e.g. between Xmas and New Years for non-commerce type servers. If it breaks your Unix admin has a week to put the pieces back together before people show up expecting to use it.

      Typically this is really just a test of the hardware and startup scripts (for updated packages that may not have correctly updated their startup scripts.)

    61. Re:Uh.. no by PCM2 · · Score: 1

      Even then, it took those tight-wad doctors years to make the change to TCP what with the wiring and hardware costs.

      You wouldn't be alone in your experience. I don't know if it's really being tight-wads or just general reluctance to commit to IT, but I've never heard of such a thing as a doctor whom you could describe as being "proactive" or "forward-thinking" when it comes to computer technology. Imagine your experience repeated thousands of times over, all across the country, with the move toward electronic medical records...

      --
      Breakfast served all day!
    62. Re:Uh.. no by Anonymous Coward · · Score: 0

      would you rather be bitten in the middle of the day, or on the planned monthly window?

    63. Re:Uh.. no by Anonymous Coward · · Score: 0

      Hmm, so obviously you don't have any serious users that run calculations that can take months to complete. So let's talk again when you are running run a real computer.

    64. Re:Uh.. no by trolman · · Score: 1

      That' exactly right. Statistically, most drive failures occur during boot.

      Depends. If you don't reboot that often the opposite. Most of my drive failures occur over night when the 'hard drive fairy' shows up. I have hundreds of spinning drives on lots of Unix, Linux, and Windows servers. The past six years all but one drive failure have been sans reboot. Our www box actually kept running and serving pages after it's RAID1 locked up due to one disk having a strange problem. It had uptime of around a year. Of course that is Slackware Linux for you. The windows boxes just choke and die at the first sign of a bad OS disk.
      Now that brings us to the re-boot vs uptime issue. I have learned a lot over the past 28 years, and a lot of learning on Linux thanks to all the wonderful Slashdot community. So If I reboot on Sunday night before a Monday holiday and the box dies, now I have some breathing room on the non-24/7 services. For the 24/7 sites I might want to do some planning and schedule a maintenance window.
      It kinda scares me to see a year of uptime on a email server or two years on a router. What will happen when the generator/ats/ups does not work and it does re-boot? That is 'when' not if and I don't care how big your 'n' number. So pick a holiday weekday when you can get vendor support, maybe even warn the vendor of the maintenance windows, and re-boot.
      I will get right around to that real soon myself.... rAG:~# w 06:08:14 up 647 days, 9:17, 1 user, load average: 0.03, 0.02, 0.00

    65. Re:Uh.. no by drinkypoo · · Score: 1

      I for one believe in frequent-ish reboots.

      After having read the article then I have several large problems with it.

      1. Rebooting Windows IS necessary, because the kernel blows goats.
      2. Rebooting should be considered MANDATORY after kernel patching or even driver replacement if the system depends on that driver t...o boot. You don't want its inability to boot to come as a surprise.
      3. If a junior admin has deleted big portions of /etc then I'm probably not going to be able to log in anyway. (I like the weasel words about "some portions" of /etc)
      4. Don't tell me not to give any "pedantic nonsense" about open filehandles; I can't be the only one who's ever had a zombie process hold open a file. Just can't.
      5. ...and if I have bad RAM I'm going to have to reboot to run memtest, because my PC probably isn't going to give me useful diagnostic information ALA a "real" UNIX machine even if I have ECC.

      Scheduled reboots are a critical part of proper maintenance of anything which absolutely can never go down. Any architecture that depends on a particular piece of hardware never failing is already a failure.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    66. Re:Uh.. no by Stenchwarrior · · Score: 1

      I have a friend that works at an internet-based insurance claims clearing house (in a position I held previously) and he said that just getting some of the rural docs to get internet connections was like pulling teeth. I had the same issue 5 years ago when I worked there but today it's not any better. I can't even imagine how bad it's going to be when they have to standardize the database formats, or at least the exported data, into something that all systems can understand and work with. I'm guessing they'll go with the ANSI X12 for claims (right now there are many different formats) but for the actual records, no idea.

      --
      Loading...
    67. Re:Uh.. no by marcosdumay · · Score: 1

      A computer may fail to start-up for several reasons...

      • Failed hardware. Your root array simply goes away, and you didn't notice because everything is already on RAM, and you didn't set proper notification, for example. Or an array controller (yep, I've seen that), or some completely unneeded hardware, that the BIOS complains and asks you to press F1 before proceeding.
      • Corrupted data. Due to something weard (maybe even a cosmic ray) the configuration script (or maybe even boot information) of your business application changed.
      • Software bugs. It started ok at 1999, but who knows if the startup procedure will work at year 2001?
      • Untested changes. The normal reason, you just change one little setting and have no time to test it now (taking the computer offline is hard). Then, you change another little thing, that also can't be tested now... And when you see it, some file that should be 700 is 600, and nothing works anymore.

      Of course, the above list is not exaustive, and of course it doesn't mean that you should reboot your servers every mount. But it hurts very little to test the startup once in a while (maybe after you change anything, for example).

    68. Re:Uh.. no by marcosdumay · · Score: 1

      I'd add that if your system is critical enough that planned downtime will hurt you, you'd better invest on some hight availability setup. And if you have a hight availability setup, rebooting your servers is no big deal.

    69. Re:Uh.. no by bdbr · · Score: 1

      I once accepted a job offer with the criteria that any failover would be periodically tested. I was frustrated with business groups refusing to test failover, or (worse) *knowing* that something wasn't working quite right and not doing anything about it. Unscheduled failures will most likely fail when the right person to fix it isn't there. It's just a matter of odds - there are 168 hours a week, but only 40 work hours and even fewer when the right person is not on vacation or off doing something else. With a scheduled failover, you can have *all* the right people on hand to make sure it works. I got the guarantee, so I took the job. They reneged on the promise a few months later.

    70. Re:Uh.. no by hicksw · · Score: 1

      The "my servers been up 3 years" e-pene days are gone folks.

      How right you are.

      My cluster has been up for fifteen years....
      --
      If you are looking for insight, you've come to the wrong place.

    71. Re:Uh.. no by badkarmadayaccount · · Score: 1

      Define replicate: the hardware as well? A VM ought to be fine for testing system configuration - just boot it on the production sever, from the same partition, as read-only. Weird driver interactions aside - I don't see where the problem is. Maybe if you are running the sever under a type 1 hyper-visor like Xen, you can even test the machine booting with actual drivers, if you suspend the production VM for a few seconds (don't forget the hyper-visor level caching can make booting much faster).

      --
      I know tobacco is bad for you, so I smoke weed with crack.
    72. Re:Uh.. no by badkarmadayaccount · · Score: 1
      • Hardware RAID - with no diagnostics reporting to the OS? Lame.
      • You give someone su, and not sudo access? That aside, some dirs need to be locked for runlevel 4 (or whatever is the production default). Not sure if linux has it, though.
      • No RAID, on a sever? That aside, why aren't you running your systems on PXE boot, or similar? And why didn't the OS report it losing a HDD?
      • Those tweaks need to be run on boot, scripted, with anacron, and request operator assistance if required, before reaching production runlevel.
      --
      I know tobacco is bad for you, so I smoke weed with crack.
    73. Re:Uh.. no by badkarmadayaccount · · Score: 1

      Power doesn't have a redundant route from socket to motherboard? Not a good idea. Power layout changes should be a non-event, modulo PSU failures. Your UPS doesn't have redundant batteries? Hell, the battery controller should be external to the UPS. And be redundant.

      --
      I know tobacco is bad for you, so I smoke weed with crack.
    74. Re:Uh.. no by badkarmadayaccount · · Score: 1

      System imaging and net-boot.

      --
      I know tobacco is bad for you, so I smoke weed with crack.
    75. Re:Uh.. no by CAIMLAS · · Score: 1

      There are no controls which can be instigated to prevent for one or more disks failing after being rebooted after 3+ years of uptime, regardless of the batch or binning of the disks in question. It can happen.

      Realistically, frequent (once a month or so?) reboots are actually a part of this control process. It assures that upgrades to the kernel get applied, and any service configuration changes made will, barring someone's forgetfulness, still be applied.

      God forbid, there is a catastrophic power failure (eg. a hurricane or week-long storm, maybe), and all your servers go down. Frequent reboots (again, probably no more than monthly) will assure that the machines are at least relatively unlikely to suffer hardware failures on coming back alive, and that any system configuration changes made prior to the reboot (but without the necessary service HUP) will, in all likelihood, be applied properly.

      Please don't presume your OS is so secure and awesome that it does not need these reboots. You are wrong.

      --
      ~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
    76. Re:Uh.. no by georgesdev · · Score: 1

      yep. Imagine the scenario: you configure and boot up the server, a year later you quit, 3 years later someone reboots it for hardware maintenance, and guess what, it does not boot up correctly that time. Good luck to them trying to figure out why and how to fix it!

    77. Re:Uh.. no by georgesdev · · Score: 1

      It booted the first time, Why would it not boot the second?

      - The time changed (y2k, September 9 1999, ...)
      - The database changed
      - The application software changed
      - The servers that this servers talks to when starting up have been changed
      - Someone did a bad edit /boot
      - Etc ...
      Bottom line is reasons to fails accumulate over time, if you reboot only in 3 years, there's a chance you'll have multiple causes for failure, and that's going to be a big pain to fix, plus no one will remember what changed over the course of those 3 years ...

    78. Re:Uh.. no by sznupi · · Score: 1

      it need to be clustered preferably on different sites with different power etc. This is why the USA has the 25th amendment.

      Sadly, in real world we don't embrace opportunities anyway - like at my place, when the second Kaczynski twin was available as a backup ;/

      --
      One that hath name thou can not otter
  4. slashdot: *world link farmers by Anonymous Coward · · Score: 5, Insightful

    i'm really tired of this semi-technical stuff on slashdot that seems aimed at semi-competent manager-types.

    1. Re:slashdot: *world link farmers by ColdWetDog · · Score: 1

      i'm really tired of this semi-technical stuff on slashdot that seems aimed at semi-competent manager-types.

      Well, there's Digg for totally non technical stuff aimed at completely incompetent manager types. Your choice.

      --
      Faster! Faster! Faster would be better!
    2. Re:slashdot: *world link farmers by Mashiki · · Score: 1

      Well turning /. into VSR isn't going to garner them any support from the monkeys that still work in the field and come here for occasional interesting stuff.

      --
      Om, nomnomnom...
    3. Re:slashdot: *world link farmers by Anonymous Coward · · Score: 0

      You should probably get used to it, I personally know a few fully incompetent manager types who read slashdot.

    4. Re:slashdot: *world link farmers by makapuf · · Score: 1

      I personally find it truly the most useful posts, not for some braindead random blog "article" but for the competent comments.

    5. Re:slashdot: *world link farmers by ddd0004 · · Score: 0

      "Semi-competent manager-types?" I'm not sure I can believe such far-fetched tales as this.

    6. Re:slashdot: *world link farmers by Anonymous Coward · · Score: 0

      you really sound like a Microsoft sysadmin that is tired of living because of so much rebooting servers

    7. Re:slashdot: *world link farmers by cortana · · Score: 1

      Read http://lwn.net/ instead.

    8. Re:slashdot: *world link farmers by Weedhopper · · Score: 1

      Thanks for putting the finger on it. I read the article and scratched my head a little thinking WTF is this guy babbling on about?

  5. Counter point -- pre-emptive reboot by Syncerus · · Score: 5, Insightful

    One minor point of disagreement. I'm a fan of the pre-emptive reboot at specific intervals, whether the interval be 30 days, 60 days, or 90 days is up to you. In the past, I've found the pre-emptive reboot will trigger hidden system problems, but at a time when you're actually ready for them, rather than at a time when they happen spontaneously ( 2:30 in the morning ).

    --
    "Man is nothing without the works of man" -- Helvetius
    1. Re:Counter point -- pre-emptive reboot by Wovel · · Score: 2

      Interestingly, all his arguments against rebooting would bolster your argument for periodic planned reboots. One of his points was that someone may have screwed up the system, it would be better to find that in a controlled environment.

      I will stay away from periodic reboots and remain firmly entrenched in the land of if it ain't broke, don't fix it.

    2. Re:Counter point -- pre-emptive reboot by arth1 · · Score: 1

      What's the purpose of this scheduled reboot, though?
      What, exactly, are you trying to pre-empt?

      If it doesn't serve a purpose, or the purpose can be solved without causing downtime, just don't do it.

    3. Re:Counter point -- pre-emptive reboot by DMUTPeregrine · · Score: 1

      Most drive failures seem to happen at boot time (at least with your normal spinning rust type.)

      --
      Not a sentence!
    4. Re:Counter point -- pre-emptive reboot by grilled-cheese · · Score: 1

      I've found the pre-emptive reboot will trigger hidden system problems, but at a time when you're actually ready for them, rather than at a time when they happen spontaneously ( 2:30 in the morning ).

      Funny; 2:30am is when I tend to schedule maintenance on servers so that users don't experience downtime.

    5. Re:Counter point -- pre-emptive reboot by TemporalBeing · · Score: 1

      Most drive failures seem to happen at boot time (at least with your normal spinning rust type.)

      But you get longer life out of the drive by not ever spinning them down. So, shutting down the system will degrade the life of the drives, and the drives are typically smart enough now to (i) alert to failure before it happens so you can switch drives, and (ii) on servers they are hot swappable in most cases.

      So again, your point is?

      --
      Truth is like the sun. You can shut it out for a time, but it ain't goin' away. - Elvis Presley (source: imdb.com)
    6. Re:Counter point -- pre-emptive reboot by Ultra64 · · Score: 1

      The purpose of it is to ensure that it will still come backup after an accidental reboot. (power outage, etc)

    7. Re:Counter point -- pre-emptive reboot by purpledinoz · · Score: 1

      Dude, did you even read the post?

    8. Re:Counter point -- pre-emptive reboot by ducomputergeek · · Score: 1

      Problem there is if it's broke you don't know about it until shit has hit the fan and now you could facing a bunch of problems that are compounded by other problems. When I ran Ecommerce system, we'd take servers down at least once a month just to make sure. Usually it would be after applying any patch having to do with PCI compliance just to make damn sure it didn't break anything.

      Of course I had the luxury the last few years of having redundant systems. Mileage may vary.

      --
      "The problem with socialism is eventually you run out of other people's money" - Thatcher.
    9. Re:Counter point -- pre-emptive reboot by element-o.p. · · Score: 1

      The answer is right there in his second sentence: "I've found that the pre-emptive reboot will trigger hidden system problems..." The purpose is to verify that the hardware is still working as it should.

      I mentioned elsewhere in the comments that I personally have seen a server keep running with a failed hard drive (non-RAID, obviously) for several months. In this particular case, everything the server needed to keep running was available in RAM and NFS mount, but the server obviously couldn't boot up again after a power outage took it off-line. This caused an unplanned outage in the middle of a production day, when a scheduled reboot in the middle of an outage window would have been much less catastrophic.

      --
      MCSE? No, sir...I don't do Windows. Yes, I am an idealist. What's your point?
    10. Re:Counter point -- pre-emptive reboot by DMUTPeregrine · · Score: 1

      Kernel security updates. Application updates that change their init scripts. Etc. Anything that happens on boot can fail when changed, and it is better to have the failures occur during scheduled downtime than unscheduled. Those changes should be rare, but reboots for security updates aren't that rare. Hardware failures are just one of the many possible failures.

      --
      Not a sentence!
    11. Re:Counter point -- pre-emptive reboot by Anonymous Coward · · Score: 0

      I think the idea is that you make sure that it can still boot up again at a time where you'll be able to fix it properly in the event that it can't boot up. The alternative is having the machine go down when you aren't expecting it and then failing to boot, which may mean a very sloppy fix and a lot of irate people.

    12. Re:Counter point -- pre-emptive reboot by Anonymous Coward · · Score: 0

      Our previous admin was like this - well okay, whatever floats your boat, except he did not make his changes persistent. Fast forward a year: a long blackout, UPS batteries eventually drop dead (and the site is not important enough to have a generator); when power was restored, 1) we found that multiple disks were badly broken, as they apparently weren't accessed at all for months (which was not an immediate problem, we replaced and resynced), but also 2) configuration was outdated (as it wasn't persisted to disk in months, which explained the no-accesses operation). Cue a multiple-hour delay in operations, as the changes had to be re-applied.

      In all, yes, don't fix it if it works, but check your assumptions - "works" IMO includes "can boot into a current, known good state".

    13. Re:Counter point -- pre-emptive reboot by Maximum+Prophet · · Score: 1

      I've found the pre-emptive reboot will trigger hidden system problems, but at a time when you're actually ready for them, rather than at a time when they happen spontaneously ( 2:30 in the morning ).

      Right on the money, but what do you say to management that insists that reboots be done during non-business hours, like 2:30am?
      Basically, our management is paranoid. They put controls in place, but never really trust those controls. Since the illusion of control is more important than actual control, real risk isn't mitigated.

      --
      All ideas^H^H^H^H^Hprocesses in this post are Patent Pending. (as well as the process of patenting all postings)
    14. Re:Counter point -- pre-emptive reboot by TemporalBeing · · Score: 1
      Let's break this down:

      Kernel security updates.

      A number of OS Kernels can now reload the kernel or apply security updates on the fly. The Linux Kernel included. So no, a reboot is not necessary there. FYI - Windows is perhaps the only OS that cannot reload device drivers on the fly.

      Application updates that change their init scripts.

      If you are updating the application you better be restarting it, and that means running the init script to stop, start, and restart it again. So no, a reboot is (again) not necessary there either.

      Hardware failures are just one of the many possible failures.

      And again, in a server environment you have things to detect hardware failures coming on. If your server doesn't have that, then you need to get a better server - a real one.

      Anything that happens on boot can fail when changed, and it is better to have the failures occur during scheduled downtime than unscheduled.

      And you seem to be missing the point that 99.9999% of what happens at boot can be detected without rebooting the system on most operating systems, Windows is the exception to that due to all its bloat and interlock, etc.
      Don't get me wrong - having a scheduled downtime is good. But you don't necessarily need to reboot during downtime - just pull the system out of active service, make the changes, test, and then restore to active service.

      --
      Truth is like the sun. You can shut it out for a time, but it ain't goin' away. - Elvis Presley (source: imdb.com)
  6. Of course you reboot, in controlled settings by pipatron · · Score: 4, Insightful

    FTFA:

    Some argued that other risks arise if you don't reboot, such as the possibility certain critical services aren't set to start at boot, which can cause problems. This is true, but it shouldn't be an issue if you're a good admin. Forgetting to set service startup parameters is a rookie mistake.

    This is retarded. A good admin will test so that everything works, before it will get a chance to actually break. Anyone can fuck up, forget something, whatever. Doesn't matter how experienced you are. Murphys law. The only way to test if it will come up correctly during a non-planned downtime is to actually reboot while you have everything fresh in memory and while you're still around and can fix it. Rebooting in that case is not a bad thing, it's a responsible thing to do.

    --
    c++; /* this makes c bigger but returns the old value */
    1. Re:Of course you reboot, in controlled settings by Syncerus · · Score: 1

      I agree with your comments completely.

      --
      "Man is nothing without the works of man" -- Helvetius
    2. Re:Of course you reboot, in controlled settings by Darth_brooks · · Score: 1

      Word.

      Reboots are a nice test of "Oh shit" situations, such as complete power failures. There are a lot of admins out there who don't have the luxury of giant battery backups that will cover everything until the automatic generators kick in. There's something that's just a tiny bit comforting about watching the machine from push to post to prompt. You know if the CMOS or RAID controller batteries are bitching about needing to b replaced (even if SNMP *might* be able to tell you this). You know that there's an NFS mount that takes a ridiculous amount of time to complete, and there are precious few ways of verifying that a BIOS upgrade went though successfully without watching.

      Uptime numbers are just penis wagging.

      --
      There are some people that if they don't know, you can't tell 'em.
    3. Re:Of course you reboot, in controlled settings by Anonymous Coward · · Score: 0

      Gee, I wonder why you didn't quote the very next sentence: "Naturally, if you're building the box and it's not in production, you can do all the reboot tests you want without adverse effects. That's just good practice."

    4. Re:Of course you reboot, in controlled settings by weicco · · Score: 1

      Doesn't matter how experienced you are. Murphys law

      You are absolutely right! I once prepared myself mentally and the server physically for a reboot. I checked every startup parameter, checked that no-one is accessing the server at the moment etc. etc. When I was ready and the time came to reboot I clicked Reboot ... No I didn't! I clicked Shutdown! Fsck!

      Server was in another building, in another city some 150 km away...

      --
      You don't know what you don't know.
    5. Re:Of course you reboot, in controlled settings by Anonymous Coward · · Score: 0

      Completely agreed. I used to run a small debian server on a 1990s powermac at home, and got in uptime groove; well over four hundred days uptime (and on the power in my building, that was quite something). On one reboot while I was away from home the power cut, and the machine came up without networking.

      Why?

      I eventually traced it to a small obscure change in one security update that only affected old-world (think non-jellybean) ROM macs with one brand of PCI ethernet card. So obscure, but it hit me, so obscure it wasn't documented except by people it'd also hit after being applied.

      Those people who'd rebooted on the day had fixed it with minutes of downtime. I lost days. I'd rather have none, but when I have to I'll take it on my schedule thanks.

    6. Re:Of course you reboot, in controlled settings by nabsltd · · Score: 1

      So, you just connected to the IPMI/ILO/DRAC/KVM over IP/whatever and selected "power on", right?

      Seriously, this sort of capability seems like a requirement for every server used for "real", and even my home servers have it, at about $10 price premium on the motherboard.

    7. Re:Of course you reboot, in controlled settings by trolman · · Score: 1

      Uptime numbers are just penis wagging.

      AIX server; in 11 years has only been down for planned outages. The UPS units have been replaced twice in this servers life.
      xxxx # w
      12:23AM up 410 days, 4:45, 8 users, load average: 0.00, 0.01, 0.06

      yep, Vyatta router has never been down, ever.
      rJDxxxx:~# w
      06:23:48 up 662 days, 12:01, 1 user, load average: 0.00, 0.00, 0.00

      Disclaimer:
      Minimizing the downtime is the driving force. The critical stuff like email, public access, routers, network devices get rebooted during a sane maintenance window. Sometimes it is a year, sometimes a month. Depends on the window(s) of opportunity. Of course the Windows servers are re-booted at least once a month to keep up with black Tuesday critical security patches. Linux I would say averages every six months. Most years the email and web servers have been up all year so I reboot them around new years.

      Monday was a holiday here and the airport admin staff was supposed to be gone. So I rebooted the microwave stuff that had 170days of uptime at 1a.m.. Guess what; one of the SUs didn't have a gateway address. Two hours of network downtime later it was all back to normal. If that happened during a big event or just in the middle of the day...well it is just a municipal airport but they are important to us.

      Bottom line: Proper maintenance makes 8a.m. on Monday a non-event. It stops the phone from ringing and the sound of boots approaching my door.

    8. Re:Of course you reboot, in controlled settings by Maximum+Prophet · · Score: 1

      If that were true, you'd reboot after every change in /etc/rc* (or whatever's appropriate for your system)

      Instead, most places just have a scheduled reboot. (Which, frankly, works, but still seems like a bad idea)

      --
      All ideas^H^H^H^H^Hprocesses in this post are Patent Pending. (as well as the process of patenting all postings)
  7. What a load of BS by kju · · Score: 4, Insightful

    I RTFA (shame on me) and it is in my opinion absolutely stupid.

    There is actually only one real reason given and that is that if you reboot after some services ceased working, you might end up with a unbootable machine.

    In my opinion this outcome is absolutely great. Ok, maybe no great, but it is important and rightful. It forces you to fix the problem properly instead of ignoring the known problems and missing yet unknown problems which might bite you in the .... shortly after.

    Also: When services start being flakey on my system, i usually want to run an fsck. In 16 years linux/unix administrations I found quite a time that the FS was corrupted without an apparent reason and with beeing unnoticed before. So a fsck is usually a good thing to run when strange things happen and to be able to run it, i nearly always need to reboot.

    I can't grasp what kind of thinking it must be to continue running a server where some services fail or behave strangely. You could end up with more damage than cause by a outage when the reboot does not go through. You just might want to do the reboot at off-peak hours.

    1. Re:What a load of BS by ColdWetDog · · Score: 3, Funny

      You just might want to do the reboot at off-peak hours.

      As someone who tends to work during 'off-peak' hours, I have a special room in Hell just expressly reserved for admins like you (and my admins who apparently are your soul mates). Just thought I'd mention this. You've been warned.

      --
      Faster! Faster! Faster would be better!
    2. Re:What a load of BS by Svartalf · · Score: 1

      So...when do you propose doing this sort of thing, pray tell?

      --
      I am not merely a "consumer" or a "taxpayer". I am a Citizen of the State of Texas
    3. Re:What a load of BS by Anonymous Coward · · Score: 1

      He'd rather you did it during the day when competent admins are present.

    4. Re:What a load of BS by Anonymous Coward · · Score: 1

      As an admin who frequently must perform system modifications for new technologies and experiments, I have a special room room reserved in hell for users like you. Just thought I'd mention this. You've been warned.

    5. Re:What a load of BS by Anonymous Coward · · Score: 0

      I RTFA (shame on me) and it is in my opinion absolutely stupid.

      There is actually only one real reason given and that is that if you reboot after some services ceased working, you might end up with a unbootable machine.

      Oh come on now, the OP says right in his article on InfoWorld that he was caught off-guard being slashdotted - he probably got an ego and/or pay boost for the added traffic and wanted to "keep up the good work" with a follow-up article meant to leech on /.'s notoriety. Cut the hapless cog some slack.

    6. Re:What a load of BS by kiwimate · · Score: 1

      Yep, bunch of nonsense, really. All the alleged "reasons" why you don't reboot Unix servers come down to oh no, woe, worry, scare tactics.... The article should be more properly titled "Why you shouldn't reboot Unix servers without knowing why you're rebooting them". It's actually kind of insulting - the author is coming close to inferring that Unix administrators are just as bad as the stereotypical Windows administrators in that they're prone to rebooting without understanding the problem.

      By the way, I've had several Windows servers which went years without a reboot and were quite happily humming along. I can remember one which had been up for a couple of years and I finally had to reboot because it was going flaky and showing disk errors so I scheduled a disk check. Guess what, the RAID card had lost its configuration and the server wouldn't start up after the shut down. That is the kind of thing that you may want to have happen at off-peak hours.

    7. Re:What a load of BS by Anonymous Coward · · Score: 0

      Good thing there's a special room in Hell for you too... although it's just your office at 9am... that would still be Hell...

    8. Re:What a load of BS by Anonymous Coward · · Score: 0

      As someone who tends to work during 'off-peak' hours, I have a special room in Hell just expressly reserved for admins like you (and my admins who apparently are your soul mates). Just thought I'd mention this. You've been warned.

      A lot of our 3rd shift admins feel the same way.

      I see it as a trade-off: y'all have to work all the maintenances that the customers want to happen during off-peak hours, and we have to, you know, actually TALK to the customers. "Talking to customers" is something you tend to forget about when you sleep through US business hours.

      Don't be an asshole.

    9. Re:What a load of BS by Anonymous Coward · · Score: 0

      So, you don't like work interrupting your monitoring of the USA network and eating cheese doodles, What-evs

    10. Re:What a load of BS by Anonymous Coward · · Score: 0

      Your job as an admin is to support other people and the systems they use. If you don't have the hardware to keep them working while you maintain the equipment, then yes, it's part of your job as an admin to do after-hours work. You are a support role (sure, a talented one) and are there to help other people do their job.

    11. Re:What a load of BS by Anonymous Coward · · Score: 0

      Doing an fsck on a large filesystem can lead to a *lot* of downtime that's not required - I've had a machine down for 18 hours after another admin decided to reboot it and fsck. (The problem was a mis-configured new ethernet adapter).

      Working on larger systems, that last thing you want is a shutdown, there's too much that can go wrong. I spent a while working on E10K systems, which Sun engineering put a *lot* of effort into keeping up, including dynamically loadable kernel patching, hot swap/spare everything and backup power systems (Sun pre-sales were hard to convince to sell you one without backup generators being on site)

      In the 5 years I worked on those systems, we had only one shutdown (power work) and sun engineering were on site for 6 hours making sure it came back (it almost did by itself).

      There really is no need to shut a unix system down in most circumstances and my experience is that when you do it's the time you're most likely to have your failures.

      Bruce

    12. Re:What a load of BS by trolman · · Score: 1

      I did not RTFA but admit to enjoying the threads. The comments from the Unix sysadmins are pretty good.

    13. Re:What a load of BS by Maximum+Prophet · · Score: 1

      There is actually only one real reason given and that is that if you reboot after some services ceased working, you might end up with a unbootable machine.

      The other reason given, is that by using the bandaid of rebooting, you don't find the actual problem.

      In the classroom, finding the "actual" problem is a useful exercise. In the real world, sometime business reasons trump finding the problem, a bandaid is exactly the most cost effective solution. Unfortunately, in the real world, many times you don't know which was the best solution until the postmortem.

      --
      All ideas^H^H^H^H^Hprocesses in this post are Patent Pending. (as well as the process of patenting all postings)
  8. *NIX 101 by Zero1za · · Score: 2

    This is like *NIX 101.

    But then, try changing the locale on a running system...

    1. Re:*NIX 101 by corychristison · · Score: 1

      But then, try changing the locale on a running system...

      This depends on your linux distro... on gentoo:
      # init 3 (assuming you're not ssh'd in)
        - edit /etc/env.d/02locale
      # env-update && source /etc/profile
      # init 5

      and you're good to go. :-)

  9. Reboots by DaMattster · · Score: 1

    By and large there is really no need to reboot a UNIX machine unless you are making a change to the kernel, i.e. an upgrade or a recompile with an added feature. Other than that, the author is correct. I have machines with uptimes of two years. It would have been more had I not had to power the machine down for a physical move.

  10. Ummm, that's a crap article by Sycraft-fu · · Score: 4, Insightful

    More or less it is "You shouldn't reboot UNIX servers because UNIX admins are tough guys, and we'd rather spend days looking for a solution than ruin our precious uptime!"

    That is NOT a reason not to reboot a UNIX server. In fact it sounds like if you've a properly designed environment with redundant servers for things, a reboot might be just the thing. Who cares about uptime? You don't win awards for having big uptime numbers, it is all about your systems working well and providing what they need and not blowing up in a crisis.

    Now, there well may be technical reasons why a reboot is a bad idea, but this article doesn't present any. If you want to claim "You shouldn't reboot," then you need to present technical reasons why not. Just having more uptime or being somehow "better" than Windows admins is not a reason, it is silly posturing.

    1. Re:Ummm, that's a crap article by pz · · Score: 2

      Please point out exactly where in the article the issue of uptime is raised. I fail to see it. Many others have also suggested that long uptimes ("e-pene" as one poster put it) is the reason for avoiding reboots. There has been no such suggestion that I could find. I authored a post to the previous thread about the origins of the Unix attitude against reboots that was highly rated and nowhere in that post, or in the follow-on replies, was uptime ever considered an issue.

      The issue -- the only issue -- is interrupting service to many users. Modern machines that serve tens to thousands of users cannot be brought down willy-nilly without incurring the wrath of those users, and rightfully so. Bringing down a system because the sysadmin was too lazy to understand what the problem was is inexcusable. The sysadmin's job is to keep the service running. When there's one user, such as in QA, or a single-user desktop, reboots can happen at will. When there are many many users, such as in a production box, file server, or similar, reboots should never be used as a problem-solving tool.

      So let go of the old, dead horse about uptime bragging rights. A correct, properly maintained Unix system does not need to be rebooted except under highly unusual circumstances. The reason that Windows boxes are treated differently is because Windows is a comparatively new OS that started out life as a one-seat system whereas, paraphrasing what I wrote in an earlier post, Unix and its intellectual antecedents had been running multi-seat systems for nigh on three decades before Windows started doing that. It's fact, not being better or worse, and the Unix and Windows cultures have grown around those two views.

      --

      Put my fist through my alarm clock with its ding-dong death inside my ear. - The Blackjacks.
    2. Re:Ummm, that's a crap article by GreyLurk · · Score: 1

      I don't think the article was expounding on Unix "Manliness" and uptime metrics... It mostly just highlighted the mistake that a lot of junior admins make (Both Windows and Unix) that it doesn't matter if you understand why the problem is happening, just mashing the power button until it goes away is the best rout forward.

      Rather than presenting technical reasons why you shouldn't reboot, It's actually probably better to ask for technical reasons why you *should* reboot. Rebooting a server to try and fix a problem is just one step above "percussive maintenance" in the hierarchy of problem solving.

      Now, I will actually suggest one additional reason to reboot a Unix Server not mentioned in the article, and that is installing a new service that's intended to be included in the boot up sequence. However, that suggestion is just a Quality Assurance measure: Make sure that the service powers up as it's supposed to, in case some unexpected downtime does happen, and ensure that the service comes back up as expected. Otherwise, Hardware and Kernel upgrades should be the only reason for a reboot.

    3. Re:Ummm, that's a crap article by Jose · · Score: 2

      Now, there well may be technical reasons why a reboot is a bad idea, but this article doesn't present any.

      hrm, the article states: ...If you shrug and reboot the box after looking around for a few minutes, you may have missed the fact that a junior admin inadvertently deleted /boot and some portions of /etc and /usr/lib64 due to a runaway script they were writing. That's what was causing the segfaults and the wonky behavior. But since you rebooted the server without digging into the problem, you've made it much worse, and you'll soon boot a rescue image -- with all kinds of ponderous work awaiting you -- while a production server is down.

      and:
      In many cases, it's extremely important not to reboot, because the key to fixing the problem is present on the system before the reboot, but will not be immediately available after. The problem will recur, and if the only known solution is to reboot, then the problem will never be fixed unless or until someone decides not to reboot and instead tries to find the root of the problem.

      and while I disagree with this one slightly..as the problem may still be present after a reboot..I defintely agree with what the author is saying...find the actual root of the problem, and fix it..don't just cross your fingers and hope a reboot will fix the problem.

      Also the author never mentions preserving uptime of the server as a goal..he does mention a few times patching in place..which will mean killing services, effectively making that particular server unavailable.

      --
      The basic sleazeware produced in a drunken fury by a bunch of UCBerkeley grad students was still the core of BIND. --PV
    4. Re:Ummm, that's a crap article by Gaygirlie · · Score: 1

      If you want to claim "You shouldn't reboot," then you need to present technical reasons why not.

      1) You should first find out what is broken and why. Rebooting without doing that only means that it may happen again and perhaps with catastrophic results.
      2) If you have found out the reason then you can fix it even without rebooting in almost all cases.
      3) Depending on the issue you might render your system unable to boot if you restart without checking first and then the system will be offline even longer than necessary.

      Tbh, all these sound very reasonable reasons to my ear. As I said in another comment they are all things that any admin worth his/her salt should know about already, but it still doesn't make them unreasonable.

    5. Re:Ummm, that's a crap article by jimicus · · Score: 1

      There is the issue that rebooting things frequently doesn't solve anything.

      It's a hell of a lot slower than just restarting a single application, and if restarting the application does not fix it but a reboot does, there's something wrong with how you're restarting the application.

      Others have already raised the issue that it deletes forensic data - quite correct, if your application has a tendency to write a corrupt file to /tmp, read it back later and crash - and your reboot process clears out /tmp - you'll never know.

      Finally, Unix admins as a rule tend to try to resolve the underlying problem rather than just reboot and forget about it. That's why we have log files which can be drilled through very quickly with tools like grep rather than an event log which 60%-80% of developers don't appear to even know exists.

    6. Re:Ummm, that's a crap article by Anonymous Coward · · Score: 0

      Hate to be in a position to defend the article. But, I'm not sure how this post got insightful 4, people with mod points that haven't RTFA'd. No where does it say anything about uptime, or even imply that is the reason to not reboot. And he does present reasons to not reboot. The article stated 3 technical reasons not to reboot.

      Case 1) /boot was deleted and the server won't come back from a reboot. A case where the problem would be easier to fix and take less time with out rebooting.

      Case 2) Loss of forensics data when you reboot.

      Case 3) Everything seeming fine and working after the reboot can lead to a case where the root problem never gets fixed. Rebooting fixs it when some service locks up, so you end up living with monthly lock up and reboot cycles. Or whatever the period is.

      That said the article was over simplified and not really worth the read.

    7. Re:Ummm, that's a crap article by onefriedrice · · Score: 1

      Now, there well may be technical reasons why a reboot is a bad idea, but this article doesn't present any. If you want to claim "You shouldn't reboot," then you need to present technical reasons why not. Just having more uptime or being somehow "better" than Windows admins is not a reason, it is silly posturing.

      It's clear to me you didn't actually read the article, because there was [one] very good technical reason why rebooting as a troubleshooting technique is a Bad Idea, and it was even on the front page: If the system is having problems and you don't know what's wrong, then the same problem may also prevent the system from coming back up cleanly (or at all). Worse, the reboot may have destroyed some evidence as to how the problem started (i.e. in the form of temporary files). You're now probably not any closer to diagnosing the problem, and you may be in a less convenient position to make the actual repairs needed.

      In fact, the article doesn't even mention "precious uptime" as a reason not to reboot, so most (all?) of your speculation doesn't apply. Furthermore, the subtitle of the article is fairly accurate, whether you're offended by it and consider it posturing or not: "Rebooting Windows boxes is a way of life, but rebooting by default can you get you nowhere fast when running Unix." Rebooting really is a troubleshooting step most Windows admins follow, as far as I've seen. There are good reasons to reboot a Unix server (let uptime be damned), but rebooting just because you don't know what's wrong is not smart, and that's what the article is actually about., according to someone who actually read it. I also happen to agree with most of the article.

      --
      This author takes full ownership and responsibility for the unpopular opinions outlined above.
    8. Re:Ummm, that's a crap article by RyuuzakiTetsuya · · Score: 0

      What's worse for your SLA, slogging through piles of /var/log dumps or rebooting an already unresponsive machine?

      --
      Non impediti ratione cogitationus.
    9. Re:Ummm, that's a crap article by Svartalf · · Score: 1

      Depends on whether you can postmortem the cause or not. If it's happening periodically, your SLA means little as you're having more downtimes spread out over time (which can be as bad or worse than the "longer" perceived downtime) and the users take to talking about the unreliable system.

      It's a fine line, really, that should probably lean at least a little towards what he's talking to, though he's taking it to an extreme.

      --
      I am not merely a "consumer" or a "taxpayer". I am a Citizen of the State of Texas
    10. Re:Ummm, that's a crap article by Culture20 · · Score: 1

      Rebooting a server to try and fix a problem is just one step above "percussive maintenance" in the hierarchy of problem solving.

      Please tell me you didn't just knock percussive maintenance. If the Fonz can use it on a 45 rpm record player, I can use it on a 10,000rpm disk. Although toaster ovens usually work better for getting a good spin-up.

    11. Re:Ummm, that's a crap article by Blakey+Rat · · Score: 1

      That is NOT a reason not to reboot a UNIX server. In fact it sounds like if you've a properly designed environment with redundant servers for things, a reboot might be just the thing. Who cares about uptime? You don't win awards for having big uptime numbers, it is all about your systems working well and providing what they need and not blowing up in a crisis.

      Bingo. The importance is the uptime of the system, not any one individual server. If you only have one server composing "the system", then you already don't care about uptime. If you have multiple servers comprising it, then regularly rebooting individual servers is a good idea.

    12. Re:Ummm, that's a crap article by Anonymous Coward · · Score: 0

      Now, there well may be technical reasons why a reboot is a bad idea, but this article doesn't present any.

      hrm, the article states: ...If you shrug and reboot the box after looking around for a few minutes, you may have missed the fact that a junior admin inadvertently deleted /boot and some portions of /etc and /usr/lib64 due to a runaway script they were writing.

      Strawman argument, if there is an admin deleting all of / you will have problems and it actually does not matter if you reboot or not. Maybe rebooting is even better so the script is killed during the shutdown before it continues from /boot and /bin on to /home...

      In many cases, it's extremely important not to reboot, because the key to fixing the problem is present on the system before the reboot, but will not be immediately available after. The problem will recur, and if the only known solution is to reboot, then the problem will never be fixed unless or until someone decides not to reboot and instead tries to find the root of the problem.

      The only real argument I could find in the article.

  11. HP-UX says... by RedK · · Score: 2

    You lie.

    Seriously. I don't know what HP is doing, but NFS hangs/stuck processes that you can't kill -9 your way out of is just wrong.

    --
    "Not to mention all the idiots who use words like boxen."
    Anonymous Coward on Monday August 04, @06:49PM
    1. Re:HP-UX says... by inflex · · Score: 2

      NFS is designed to be like that, block/hang until connection is restored... though not sure about the resilliance to the sig-9 though. You do now have the option on some NFS systems to have a soft-block.

    2. Re:HP-UX says... by RedK · · Score: 2

      I've had systems with HP-UX that could rpcinfo/showmount on the NFS server and yet still had hanged filesystems. Soft, hard, whatever mount option, it's random. Then when you try to shut down the NFS subsystem, the rpc processes get stuck, you try to kill -9 and they simply don't die. umount -f doesn't work. Nothing works.

      You really have to have experience on HP-UX to understand the pain... And if only I was talking about the old 11iv1 instead of the brand spanking new 11iv3 with ONCplus up to date.

      --
      "Not to mention all the idiots who use words like boxen."
      Anonymous Coward on Monday August 04, @06:49PM
    3. Re:HP-UX says... by sribe · · Score: 4, Informative

      Seriously. I don't know what HP is doing, but NFS hangs/stuck processes that you can't kill -9 your way out of is just wrong.

      Kind of a well-known, if very old, problem. From Use of NFS Considered Harmful:

      k. Unkillable Processes

      When an NFS server is unavailable, the client will typically not return an error to the process attempting to use it. Rather the client will retry the operation. At some point, it will eventually give up and return an error to the process.
      In Unix there are two kinds of devices, slow and fast. The semantics of I/O operations vary depending on the type of device. For example, a read on a fast device will always fill a buffer, whereas a read on a slow device will return any data ready, even if the buffer is not filled. Disks (even floppy disks or CD-ROM's) are considered fast devices.

      The Unix kernel typically does not allow fast I/O operations to be interrupted. The idea is to avoid the overhead of putting a process into a suspended state until data is available, because the data is always either available or not. For disk reads, this is not a problem, because a delay of even hundreds of milliseconds waiting for I/O to be interrupted is not often harmful to system operation.

      NFS mounts, since they are intended to mimic disks, are also considered fast devices. However, in the event of a server failure, an NFS disk can take minutes to eventually return success or failure to the application. A program using data on an NFS mount, however, can remain in an uninterruptable state until a final timeout occurs.

      Workaround: Don't panic when a process will not terminate from repeated kill -9 commands. If ps reports the process is in state D, there is a good chance that it is waiting on an NFS mount. Wait 10 minutes, and if the process has still not terminated, then panic.

    4. Re:HP-UX says... by Anonymous Coward · · Score: 0

      Very old version? I have 11i v1 and v2 servers with 2-3 NFS mounts that have 500+ days of uptime - no issues at all.

    5. Re:HP-UX says... by tscheez · · Score: 1

      happens in linux too, if you forget to add intr to your nfs mount command.....

      --
      Supplies!
    6. Re:HP-UX says... by sprag · · Score: 1

      10.10 and 10.20 sucked pretty hard.

      Does it still make you reboot after (nearly) every patch when updating?

    7. Re:HP-UX says... by Svartalf · · Score: 2

      That's why I'm all for coming up with something OTHER than NFS for server framework. Seriously. And using it in a HP/HA cluster is...verging on insane... It's an old crufty design that was designed for use in a simpler time with simpler conditions- and it wasn't all that great then.

      --
      I am not merely a "consumer" or a "taxpayer". I am a Citizen of the State of Texas
    8. Re:HP-UX says... by Heretic2 · · Score: 1

      Zombie process are typically the faults that result from software errors. In this case, there is an error in the kernel implementation of NFS. This is indicative that you need to patch your kernel/install a new one.

    9. Re:HP-UX says... by dkf · · Score: 1

      NFS mounts, since they are intended to mimic disks, are also considered fast devices. However, in the event of a server failure, an NFS disk can take minutes to eventually return success or failure to the application. A program using data on an NFS mount, however, can remain in an uninterruptable state until a final timeout occurs.

      That reminds me of an old NFS server back where I used to work, many years ago. That server was serving the main applications partition (/usr IIRC) to a few hundred Sun machines across the department, and to cut a long story short, it was feeling "poorly". I don't know what exactly was wrong hardware-wise, but the net effect was that the machine would reboot from time to time and all of those few hundred client machines would screech to a halt while waiting for almost the whole of their user-space to become available once more. Ouch.

      For added fun, it turned out that the machine would do its unscheduled reboot and decide that the lack of a clean shutdown meant that it had to fsck. On the hardware of the time, this was around 30 minutes. Double ouch.

      With the reboot done, the nfsd would start again and advertise that the mounts were all available once more. At that point, the 300 or so client machines would all jump in at once to page in all the applications that had been waiting. This would cause the load on the server to spike massively, and trigger the hardware problem again after 30 to 60 seconds or so. That's when it would trigger the whole damn cycle again. Quadruple ouch!

      On the plus side, it made it very easy to persuade the head of the department to spring for a whole cluster of brand new fileservers. It's an ill wind that blows no good at all.

      --
      "Little does he know, but there is no 'I' in 'Idiot'!"
    10. Re:HP-UX says... by Anonymous Coward · · Score: 0

      try sshfs (on linux it's via fuse)

    11. Re:HP-UX says... by Anonymous Coward · · Score: 0

      Woah so in the couple of decades or so since NFS has been around no-one thought to fix the kernel?

    12. Re:HP-UX says... by Anonymous Coward · · Score: 0

      That would be a bug, any good UNIX admin would get a crash dump and then whack HP-UX upside the head with it until they fix it...simples.

      Of course most people don't know how to be tough with vendors anymore, and yes I have been a software vendor in the past.

    13. Re:HP-UX says... by Anonymous Coward · · Score: 0

      I've had that happen on HP-UX as well, going back to at least 10.20. Their methodology for mounting CDs was just odd, and when it failed - which was early often, and there were lots of patches trying to fix it - the process ownership would switch to 1.....let alone the nfs processes - and you couldn't mount a CD without removing the device (if you could do that) and re-adding it or else cycling the box. Blech.

  12. Virtualization to the rescue by Anonymous+Showered · · Score: 4, Interesting

    I run web servers for a few dozen clients, and rebooting a remote machine was always scary. There was the possibility that something might not boot up during startup (e.g. SSHd) and I would be locked out. I would then have to travel to my data center downtown (about 30 minutes away) and troubleshoot the problem. Since I don't have 24/7 access to the DC (I don't have enough business with the DC to warrant an owned security pass...) I have to wait until they open to the general clientèle in the morning.

    With ESXi, however, I'm not that scared anymore. If something does go wrong, I have a console to the VM through vCenter client (the application that manages virtual machines on the server). It's happened once where a significant upgrade of FreeBSD 7.2 to 8.1 was problematic. Coincidentally, it was because I didn't upgrade the VMware tools (open-vmware-tools port). Nonetheless, I managed to fix the problem through vCenter.

    This is why I love virtualization in general. It's making managing servers easier for me.

    1. Re:Virtualization to the rescue by inflex · · Score: 1

      It's why a "good" server has a lights-out system in it that lets you gain access to the machine as it boots as if you were there with a keyboard/console.

      Of course, yes, the VM-route is nice, I do that too now ( so long as you don't mess up the host :D ).

    2. Re:Virtualization to the rescue by Spad · · Score: 1

      Not to mention the joy of snapshots.

    3. Re:Virtualization to the rescue by jimicus · · Score: 1

      That rules out Dell's remote access then. It demands a specific browser version and a specific Java version, and won't fire up the console unless both are present. Dell also have a nasty habit of not updating the browser/Java version detection code that often.

      Of course, you only discover/remember this a week after you've updated everything to IE 8|FireFox 3.6|(browser of choice) and patched Java to the latest version.

      Mind you, what do you expect from the Ryanair of hardware vendors?

    4. Re:Virtualization to the rescue by sjames · · Score: 1

      Virtualization has it's place and that's one of them. I'll point out though that a recent server with IPMI gives you those benefits as well. It seems that the kinks are finally ironed out of the shared network connections. Just avoid the ones that insist of looking like a web server pushing java apps at you, they tend to fail miserably if you need to tunnel to the management network.

    5. Re:Virtualization to the rescue by Rich0 · · Score: 1

      Agreed on this one - best of both worlds. Go ahead and reboot - and if that turns out to be a mistake just restore the snapshot and you didn't reboot after all... :)

    6. Re:Virtualization to the rescue by Anonymous Coward · · Score: 0

      Coincidentally, it was because I didn't upgrade the VMware tools....This is why I love virtualization in general. It's making managing servers easier for me.

      That's pretty funny. BTW, do your hosts not have their management console ports configured? Why can't you get to those remotely?

    7. Re:Virtualization to the rescue by Mad+Merlin · · Score: 1

      I'm the last person in the world to defend Dell, but their servers do ship with standard IPMI implementations that you can connect to with anything that speaks IPMI. I'm not sure what this browser nonsense is all about.

    8. Re:Virtualization to the rescue by jimicus · · Score: 1

      IPMI is great if you can get by with a serial console, which obviously is trivial enough to setup under Linux (even if most distros don't do it automatically). Or, for that matter, if the only thing you really need to do remotely is power-cycle it because you know it'll come up OK.

      Not, however, terribly useful for Windows Server.

    9. Re:Virtualization to the rescue by pwinkeler · · Score: 1

      Sure - reboot away; but how often will you be rebooting your ESXi server? I mean, it is exposed to the same sources of bugs that Linux is (i.e. Humans). My point here is that once you start to give in to the reasoning that you should reboot on a regular basis you might also start wondering whether the architect is about to get ready for his regular reboot of our universe.

      Remember now: it *IS* turtles all the way down!

      --
      PaulW, IT Consultant
    10. Re:Virtualization to the rescue by Anonymous Coward · · Score: 0

      Until you reboot the host server and can't get remote access to *any* of your servers.

    11. Re:Virtualization to the rescue by Anonymous Coward · · Score: 0

      Before virtualization, people used a serial console with a networked power supply.

    12. Re:Virtualization to the rescue by cortana · · Score: 1

      You forgot to mention the existence of serial consoles in your eagerness to shill for VMWare.

    13. Re:Virtualization to the rescue by Anonymous Coward · · Score: 0

      This is why I love virtualization in general. It's making managing servers easier for me.

      Which is fine, if you don't need real access to the ESXi console in an emergency. I got effectively locked out of my own ESXi box (a dell R510) because the management agent shat itself, and I didn't leave remote SSH access on by default. All the VM's were happy, but I couldn't manage them at all short of in band remote desktop or VNC or SSH. Thankfully I had enough foresight to enable IPMI access and was able to remote power cycle the bastard after shutting down all the VM's I could. I may have been bitten by an ESXi Broadcom firmware bug, because I had another R510 die on me exactly the same way locally around the same time, and even with local console access, I couldn't get the management agents to recover into a usable status even though they were running. Only a full restart fixed remote access, otherwise I was stuck with a headless zombie server.

      It's crap like this that makes having remote power cycling/IPMI/remote KVM console access a godsend.

      But avoiding restarts is a valid stance. Too bad VMware makes most firmware (aka driver) patches a mandatory reboot event, even for patches containing exclusively drivers you aren't even using. Which is a different pack of fail, but let's not go there.

    14. Re:Virtualization to the rescue by Anonymous Coward · · Score: 0

      You could use KVM-over-IP on your servers to achieve the same funcionality you describe.

    15. Re:Virtualization to the rescue by GPLHost-Thomas · · Score: 1

      What you need here is a KVM over IP and IPMI reboots, because even your "main os" that does the virtualization could fail, and then you'd be back to square 1...

    16. Re:Virtualization to the rescue by Anonymous Coward · · Score: 0

      Ditto. Not vmwar, but close. I have machines of the linx vserver host variety located in 5 different cities in Germany. The services on the host are always minimal. Fail2ban, shorewall/iptables, and, of course, the kernel. The vservers themselves are various and sundry. Gentoo, Debian, Centos. If I have a problem with a service in the vserver, it's never a problem to arrive at the host (well, almost never). vserver xxx stop/start. Hmmm.

      Rebooting the host is really as much as never needed since it runs as few services in as clearly predetermined a way as you can design. The vservers themselves (regardless of distro), I know like the back of my hand since use a deployment infrastructure (puppet++ if you care)....

      Far from reducing my range of skills as an administrator, the longer I've used virtualized linux the MORE I've learned and the more readily I recover from service problems.

  13. I read TFA by pak9rabid · · Score: 2, Interesting

    What a load of horse shit.

    1. Re:I read TFA by Captain+Centropyge · · Score: 1

      Thank you for contributing generously to the conversation...

      --
      Bite my shiny metal ass!
    2. Re:I read TFA by Tranzistors · · Score: 1

      And your analysis is insightful.

    3. Re:I read TFA by FormOfActionBanana · · Score: 1

      Amen brother!

      --
      Take off every 'sig' !!
    4. Re:I read TFA by trolman · · Score: 1

      What a load of horse shit.

      We appreciate that you 'took one for the team.' At least the discussion is good. Classic 90's slashdot!

      For the record I did not RTFA. Having posted three comments and reading on -1, without RTFA, I have a smug satisfied feeling.

  14. Library uprades are the tricky part by Anonymous Coward · · Score: 2, Informative

    Often system upgrades (eg. security fixes) include new versions of libraries and such. It's impossible for the package manager to know which processes are using those libraries so it can't automatically restart everything. Consider if you have custom processes running, the package manager wouldn't even know about them.

    Therefore you have to do it manually, but then you have the same problem. It's damn hard to know which processes are using the libraries that were upgraded. Really, really hard if it's a big server running hundreds or thousands of processes. Often it's easier just to reboot so you make sure everything is running the current version of all the libraries. If you don't then you can't be sure that all the security fixes are actually running on the system since it will be using the old cached versions of the libraries in RAM.

    1. Re:Library uprades are the tricky part by jimicus · · Score: 1

      man lsof

  15. Not better than the others by cpct0 · · Score: 2, Interesting

    Quotes from stupid people:
    You should never reboot a Mac, it's not like Windows.
    You should never reboot Unix/Lunux, it's not like Windows.

    Well, you shouldn't reboot Windows either. You reboot it when it goes sour. Our Windows servers seldom go sour, so we don't reboot them. Same for Mac or *nix.

    Problem is when it starts to cause problems. Like our /var/spool partition deciding it has better things to do than exist... or the ever so important NFS or iSCSI mount that decides to Go West, and gives us the ??? ls we all dread ... with umounting impossible, so remounting impossible, and all these stale files and stuff. You either tweak these things for hours cleaning up all processes, or you reboot.

    In fact, being a good sysadmin, all my servers are MEANT to be rebooted if something goes sour. One SVN project goes sour? check if it's not the repository itself that got problems, or if the system needs to save something to safely exist ... and if not, reboot the server. Everything magically restarts itself, does its little sanity check, and a quick look at a remote syslog to make certain everything is all right. 2 minutes lost for everyone, not 3 hours of trying to clean up mess left by some stray process somewhere or trying to kill the rogue 100 compression and rsync jobs that got started eating up all RAM, CPU and network.

    Since all our servers are single processes and are either VMs or single machines, it's a breeze to do this. iSCSI will diligently wait before the machine is back up before trying to reconnect. NFS will keep its locked files up, and will reconnect to them. No, seriously, everything simply reconnects!

    Of course, the idea is to minimize these occurences, so we learn from it, and we try to repair what could've caused this problem in the first place. And there's a place to do this in a server crash postmortem. But no need to make users wait while we try to figure out wth.

    1. Re:Not better than the others by Anonymous Coward · · Score: 0

      We don't reboot our windows servers... we don't have to... we let the windows systems crash on there own from: Viruses, dynamic updates, admins doing things on the server they shouldn't, Virus checkers run amok, resource contention/races, etc.

      Where did the UNIX mentality of not rebooting come from ? The old guys that ran the Mainframes... it wasn't that they were afraid of reboots, but they kept their hand on the wheel and 'knew' what was going on inside their systems. Windows servers are analogous to breeding rabbits; just let them run until they die and replace them..

    2. Re:Not better than the others by Tom · · Score: 1

      In fact, being a good sysadmin, all my servers are MEANT to be rebooted if something goes sour.

      You and I have different work philosophies.

      Yes, rebooting is the fastest way to bring a system from an unknown state back to a known state.

      But a system should never enter an unknown state. Every event of that type is the result of a failure. Sometimes, it happens, IT is complicated like that. But it shouldn't happen.

      My philosophy is that every server should come up clean and running all its services if it ever gets rebooted for whatever reason - no handholding. At the same time, it should not ever require a reboot. If it does, something went horribly wrong.

      --
      Assorted stuff I do sometimes: Lemuria.org
    3. Re:Not better than the others by Just+Some+Guy · · Score: 1

      In fact, being a good sysadmin, all my servers are MEANT to be rebooted if something goes sour. One SVN project goes sour? check if it's not the repository itself that got problems, or if the system needs to save something to safely exist ... and if not, reboot the server.

      I'm sorry, but you misspelled "inexperienced". I have nothing against rebooting and will cycle a machine when appropriate, but if you're having to reboot an SVN server, you've done fucked up. There's something misconfigured that's allowing it to enter states it should never be in, and you'd be far better served doing a little root cause analysis to find out what's happening so you can stop it from happening next time.

      In general, then: when you have to reboot, find out why you have to reboot and fix it. Your way is easier in the moment, but scales horribly and makes a lot more work for you than it should.

      --
      Dewey, what part of this looks like authorities should be involved?
    4. Re:Not better than the others by cpct0 · · Score: 1

      And I agree with you totally it should not require a reboot, reboots should not happen.

      Like in Windows ...

      But in real life, they do. And sometimes, once it's determined what the cause might be, it's much easier to do a "shutdown -r 0", wait for 30 seconds for the server to stop, wait for 2 minutes for the server to come back up, look at the log, see everything is peachy, and THEN solving whatever went wrong than taking 2 hours of my life with users poking me every 30 seconds because the server is down.

      Example that just happened today: a NFS server was having problems with some actions... not all, but some very rare admin actions. We restarted the server because it was hung, with stale NFS mounts galore, and application hung because it was unresponsive.
      Rebooting the whole system meant all processes at least tried to stop, and then they restarted correctly. After further investigation with whatever was causing problems, we understood one of our new servers needed nolock in clients due to a software glitch. We remounted with additional options, and it kept running along from that point on. Next step is to do the right thing and correct the NFS software bug ... and once we'll have the latest software update, we'll allow locks yet again.

      Total downtime: 5 minutes. Caveat: for 2 hours, users couldn't do a particuliar Admin command. Now: fine ... Eventually: server will be back up and ok.

    5. Re:Not better than the others by cpct0 · · Score: 1

      It scales horribly
      It takes more work
      It takes more time down the end ... but then, I don't have a %(#)load of users sending me e-mails, calling me (off-site), coming to see me in real life, and I don't have to send a message to all the clients and users saying it's known, and I don't have the company managers telling me they're in a crunch or whatnot.

      95% of users coming to my desk in these mere minutes a server is down:
      - hi
      - hi. Yes?
      - how are you?
      - not too bad, you?
      - going well. I got a question for you?
      - yes?
      - Are you working on whatever server?
      - yeah, it's down. Restarting it now, there might be further downtime later on if I see a solution, and it might require further downtime. Hopefully not.
      - oh ... ok, thanks
      - no sweat.

      So I'm sorry, I'm really inexperienced I guess, I just work with active servers with users happily churning in them, and are to a point mission critical, for them at least. And FWIW, I'm merely a non-jerk and non-God-complex sysadmin that tries to answer to users with proper English and not shoving their little problems down their %/%/.

      And yes, SVN might sometimes cause problems, especially when linked to these buzzwords: SASL, LDAP, NFS, VM and 1TB worth of data.

  16. Oh, this fool again by Enry · · Score: 2

    While it's true servers don't need to be restarted as often as Windows counterparts, there are valid reasons for restarting a server:

    - new kernel, new features
    - new kernel, new security patches (yes, these are distinct reasons)
    - ensure all services restart in the event of a real failure
    - we have cases where memory fills and the system starts thrashing. It may cure itself eventually, but you can't get in via SSH or console (and no, the OOM killer doesn't kick in).

    I think item #3 is important. If you have a crusty system that's been in place for a while and it reboots for some reason, you now have to spend time to make sure everything started, figure out what didn't start, and why. This doesn't mean you need to restart once a week, but every 6-12 months is certainly reasonable.

    1. Re:Oh, this fool again by Torodung · · Score: 1

      In all those cases, you've done a thorough RCA, and you know that a reboot is a necessary part of the procedure, or is part of your RCA/auditing. (Checking that your services launch properly for disaster recovery purposes is a great example). IMO, you have met one bare minimum requirement for real competency as an admin. Not everyone who calls himself an "admin" meets that bar, however.

      All I got from the article was "don't reboot until you've done an RCA and a reboot is actually indicated, either as part of the diagnostic process, or because it is necessary."

      I don't understand why you're calling him a fool for giving sound, beginner advice. My reaction to the article was, "duh." I'm guessing he just had space to fill, and nothing interesting to write. If I were to get angry at anyone, it's whomever in the Firehose thought this was a suitable article for readers of Slashdot.

    2. Re:Oh, this fool again by sjames · · Score: 2

      Did you RTFA? He did NOT say NEVER boot, just that it is not a valid troubleshooting step (it MAY be part of a valid solution to a diagnosed problem). He explicitly named your first 2 points as good reasons to reboot. The 3rd is a bit of a rookie mistake, but as long as sshd and basic networking starts the rest can be resolved in the unlikely case that a reboot does happen. Arguably, that's a test procedure and not a troubleshooting solution.

      The thrashing case is one he didn't mention. It is sometimes the only way to get control of a machine. I would point out though that it isn't a troubleshooting step, it's an unfortunately necessary solution to the immediate problem which is preventing you from properly investigating the real problem (what made it thrash).

      The real point though is that the Windows admin seems to think reboot is the first thing to try and if it comes back up it will be left at that (until tomorrow when the same problem comes up again. Lather, rinse, repeat.) while the Unix admin considers it to be a last resort and then preferably done based on a completed diagnosis as a considered resolution to the problem.

    3. Re:Oh, this fool again by Anonymous Coward · · Score: 0

      The real point though is that the Windows admin seems to think reboot is the first thing to try and if it comes back up it will be left at that (until tomorrow when the same problem comes up again. Lather, rinse, repeat.) while the Unix admin considers it to be a last resort and then preferably done based on a completed diagnosis as a considered resolution to the problem.

      Nice hyperbole... Only the most amateur Windows server admin would reboot and "[leave it] at that", particularly as a first step. If one of my team rebooted a production server (Windows or RHEL) as a troubleshooting step, they'd be on desktop support duties quicker than you can say newbie.

    4. Re:Oh, this fool again by sjames · · Score: 1

      Sure, not all Windows admins are like that, but there is a reason Windows admins used to be called "reboot technicians". It is far more common in the Windows world. The ability to avoid rebooting is more recent in Windows as well.

      You are in a mixed technology shop which tends to have more knowledgeable people.

  17. This is a myth? by pclminion · · Score: 4, Interesting

    I've heard a lot of myths. I've never heard a myth stating "You need to reboot a UNIX system to fix problems." If anything I've heard the opposite myth. Who promulgates this shit?

    I do remember ONE time a UNIX system needed a reboot. We (developer team) were managing our own cluster of build machines. The head System God was out of town for two weeks. We were having problems with a build host, and tried everything. Day after day. Finally, on the last day before System God was due to return, it occurred to me that the one thing we hadn't tried was to reboot the machine. The reboot fixed the problem, whatever it was.

    I felt stupid. One, for not figuring out the problem in a way that could avoid a reboot. Two, for not recording enough information to determine root cause in a post-mortem analysis. Three, for configuring a system in such a way that a reboot might be required in order to fix a problem.

    To this day I believe that reboot was unnecessary, although at the time it was the fastest way to resolving the immediate blocking issue.

    1. Re:This is a myth? by Anonymous Coward · · Score: 0

      So you wasted two weeks because you didn't want to reboot. You're fired.

    2. Re:This is a myth? by Anonymous Coward · · Score: 1

      Telcos.
      Scheduled reboots are one thing, but I was appalled to learn that the #1 trouble-shooting step Avaya techs propose for their Solaris/Sparc and Linux/x86 based systems is "reboot it. Do it once a week, every week without fail". Turns out Avaya's CMS system is rigged to lock up if not rebooted, and fixing it voids your support contract.

      I'm pretty sure that at the telco at least, it started from that, and became common practice to just reboot everything, servelet container? Reboot it. PBX is intermittently dropping calls? Reboot it. CMS locked up? Reboot it. Avaya just doesn't want third party subcontracted admins coming in and fixing things, presumably because despite our admittedly ridiculous rates, rates, we're still significantly ore cost effective than their 6-7 figure support contracts.

    3. Re:This is a myth? by Anonymous Coward · · Score: 0

      I've heard a lot of myths. I've never heard a myth stating "You need to reboot a UNIX system to fix problems." If anything I've heard the opposite myth. Who promulgates this shit?

      InfoWorld does.

  18. Unnecessary Windows reference by Anonymous Coward · · Score: 0

    I just don't get it why was it necessary to make reference to Windows here? Most of the legit reasons not to reboot Unix box he listed apply to Windows and it's analogous subsystems too.

  19. Sometimes ... by PPH · · Score: 4, Funny

    ... the crap I read on Slashdot is so unbelievable, I have to reboot my laptop in the hopes that it will go away.

    --
    Have gnu, will travel.
    1. Re:Sometimes ... by The+Wild+Norseman · · Score: 1

      ... the crap I read on Slashdot is so unbelievable, I have to reboot my laptop in the hopes that it will go away.

      You wouldn't have to if you were running Linux.

      --
      "A government is a body of people usually -- notably -- ungoverned." -Shepherd Book
    2. Re:Sometimes ... by uninformedLuddite · · Score: 0

      Once upon a time I could enjoy the back and forth in these comments. Being exposed to much that was stimulating, fascinating and sometimes truly amazing.
      These days I would prefer headbutt than moderate. What the fuck happened to this website?

      --
      The new right fascists are bilingual. They speak English and Bullshit.
    3. Re:Sometimes ... by Anonymous Coward · · Score: 0

      See, if you had read the article you'd know a reboot isn't a real fix, you need to investigate and fix the problem or it might come back. But I think I have a solution for you; put "127.0.0.1 slashdot.org" in your hosts file and you won't see any more stupid articles when you browse to the slashdot home page.

  20. Not just *nix by Spad · · Score: 2

    The same argument can be applied to Windows servers; sometimes rebooting will only make things worse, or at least no make things any better. Unfortunately, these days the trusty reboot is often the first option instead of last resort; at the very least some basic troubleshooting needs to be done to identify potential causes before you likely erase half the evidence.

    I suffer from a desktop variant of this issue at work, whereby re-imaging has become the "troubleshooting" tool of choice, to the point that all thought has now left the support process so that I've witnessed an engineer re-image a PC 3 times (at 30+ minutes each time) before someone else identified that the issue was being caused by a BIOS setting and that re-imaging was a complete waste of time.

    Let's face it, if your admin/support staff are lazy and/or stupid, then it doesn't matter which approach they take because they're not going to fix the problem anyway.

    1. Re:Not just *nix by jimicus · · Score: 1

      When everyone's running a known desktop image, it's hard to justify putting much effort into troubleshooting a desktop PC when you can reimage it in 20-30 minutes.

    2. Re:Not just *nix by w0mprat · · Score: 1

      A good fraction of problems on Windows servers can be solved by restarting a service, or 3rd party system app. Sometimes logging off/on the user account that a service runs under is as good as a reboot. I would say this is a universal rule not just for UNIX but for any contemporary OS server or otherwise. These days we actually have things to try before a reboot is necessary.

      --
      After logging in slashdot still does not take you back to the page you were on. It's been that way for 20 years.
    3. Re:Not just *nix by gandhi_2 · · Score: 1

      no one who images a machine three times should be called an engineer.

      in defense of imaging, I respect this approach over "run hijack this, malwarebytes, registry cleaner, and spybot s&d.... then apply these file changes and registry hacks I found on the internets". that junk doesn't belong in an enterprise....

    4. Re:Not just *nix by Anonymous Coward · · Score: 0

      With a windoze box the same problem is buying a windoze box to begin with.

    5. Re:Not just *nix by Anonymous Coward · · Score: 0

      Let's face it, if your admin/support staff are lazy and/or stupid, then it doesn't matter which approach they take because they're not going to fix the problem anyway.

      Living in this hell at work and it SUCKS!

  21. Paul Venezia by Anonymous Coward · · Score: 0

    Paul Venezia is an uneducated piece of garbage.

  22. Broken Logic by Grindalf · · Score: 0

    Is it just me, or is the logic behind this article broken?

    --
    The purpose of existence is to make money.
  23. New rule for Slashdot by aztektum · · Score: 5, Insightful

    /. editors: I propose a new rule. Submissions with links to PCWorld, InfoWorld, PCMagazine, Computerworld, CNet, or any other technology periodical you'd see in the check out line of a Walgreens be immediately deleted with prejudice.

    They're the Oprah Magazine of the tech world. They exist to sell ads by writing articles with grabby headlines and little substance.

    --
    :: aztek ::
    No sig for you!!
    1. Re:New rule for Slashdot by Anonymous Coward · · Score: 0

      /. editors: I propose a new rule. Submissions with links to PCWorld, InfoWorld, PCMagazine, Computerworld, CNet, or any other technology periodical you'd see in the check out line of a Walgreens be immediately deleted with prejudice.

      They're the Oprah Magazine of the tech world. They exist to sell ads by writing articles with grabby headlines and little substance.

      Psssssst, so is /.

    2. Re:New rule for Slashdot by frequnkn · · Score: 1

      Please mod parent up. A lot. For me, the whole point of /. is having one place to go for salient, well-curated geek news. The more more fluff, the less value - simple as that.

      -foo

    3. Re:New rule for Slashdot by Anonymous Coward · · Score: 0

      Second new rule: Submissions with links to articles taking up multiple pages of single pictures and short paragraphs are to be deleted with prejudice. They exist to sell ads by making readers go through multiple pages. Either use the actually readable print version of the page or don't submit it at all.

    4. Re:New rule for Slashdot by Rik+Rohl · · Score: 1

      They exist to sell ads by writing articles with grabby headlines and little substance.

      Just like /.

    5. Re:New rule for Slashdot by lanner · · Score: 1

      Meetoo. Please make this happen.

  24. True Scotsman fallacy by O('_')O_Bush · · Score: 1

    Did anyone else notice the reek of the True Scotsman fallacy? If you agree with him, he brags about it. If you don't, he cites the reason to be because you aren't a TRUE pro-unix admin.

    Sorta grates on my nerves a bit.

    --
    while(1) attack(People.Sandy);
    1. Re:True Scotsman fallacy by RocketRabbit · · Score: 1

      You wouldn't complaining about this if you were a real Slashdotter.

  25. so the people who supposedly spread this myth... by dAzED1 · · Score: 1

    The new crop of sysadmins are sortof funny. I wasn't aware that there was a myth that rebooting a server fixed anything, among the unix ranks. Of course that doesn't fix anything.
    Are the people spreading this myth the same folks that log in as root because hey - they're the sysadmin, and access controls are for wimps?

  26. Rebooting destroys information by mangu · · Score: 1

    I never reboot unless the system hangs up completely. In recent years I had to reboot once, when the air conditioning failed and a server had a bad memory alarm.

    By keeping reboot as an extreme measure, I know when something truly bad happened. If I reboot without reason, I lose that information.

  27. Windows Too by Anonymous Coward · · Score: 0

    Actually I find I take the same approach with windows as well. Most of the time if you reboot a windows box the same problem is just going to repeat it self time and time again. So rebooting isn't actually a solution here either.

    Currently working as a software developer in a company I got so pissed off with the servers rebooting all the time. When the resident it guy was on holiday it fell to me (next person who had any experience of doing this sort of thing). But the time he came back the network was running perfectly fine. Managers asked what i had done. I just said i fixed a few things properly about 4-5 major things instead of just rebooting it ... Long story short. The it guy isn't there any more. I took over his role on top of my own and the network has been running find ever since ...

    Oh I got a chunk of the other guys pay for it too :)

  28. I would have docked you a weeks pay... by Anonymous Coward · · Score: 1, Interesting

    ...for wasting company time on non-solutions instead of doing a reboot that took 1 minute.

    1. Re:I would have docked you a weeks pay... by Overzeetop · · Score: 0

      I wish I had mod points. This is the kind of thing that gives engineers a bad name. Fix the damned thing and do your post mortem after the system is back up. It sounds like you all got shit done for two weeks because you had to be "right."

      --
      Is it just my observation, or are there way too many stupid people in the world?
    2. Re:I would have docked you a weeks pay... by pclminion · · Score: 1

      The "company time" was my time and the machines were ours to do with as we wished. My immediate boss was VP of engineering, and he was part of the effort. Not everyone is a cog in a vast machine.

    3. Re:I would have docked you a weeks pay... by Anonymous Coward · · Score: 0

      ...for wasting company time on non-solutions instead of doing a reboot that took 1 minute.

      And if I had managers like you working for me, you'd be fired.

      A reboot isn't a solution. If your problem went away after a reboot, chances are the problem will return. There is absolutely no company in existence (other than small businesses) that doesn't require a ticket to be filed with the exact reason the problem occurred and why a reboot is a necessary step for all non-upgrade reboots. They don't do that because of ego, they do that because they don't want to have to reboot again next week.

  29. Nothing to read here, move along.... by macson_g · · Score: 1

    Its a second time in a week when a barely interesting post from this guy's blogs makes it to the main page. What's wrong with you, /. ?

    1. Re:Nothing to read here, move along.... by trollertron3000 · · Score: 1

      They're careless dumb assholes who don't do shit. That's what's wrong with slashdot. Don't believe me? Email them. If you get a response I'll suck your junk ASAP.

      --
      Tiger Blooded Bi-Winning Machine
  30. Seperates the n00bs from the pros. by Anonymous Coward · · Score: 0

    If anything would break on a reboot of a Unix system, your sysadmins aren't doing their fucking jobs and need to be crucified on the shattered remains of a cabinet.

    If you can't reboot a specific, single server without production impact, your architects aren't doing their fucking jobs and need to be crucified on a whiteboard easel.

    If all you care about is some asinine 'uptime' number, turn in your fucking credentials now - you have no business being anywhere near a command line.

    I know this is Slashdot, and fanbois abound, but this is coming it a bit high. Yes, Suzy, even Unix systems need to be rebooted now and then.

  31. Eh? by ledow · · Score: 2

    - Design system
    - Build system (involves inevitable reboots)
    - Test system (involves inevitable reboots)
    - Move system into production.

    Once the services you need start up the way you want, don't play with it. Put it into service and have backups of the original image, any changes you make and a working replacement (Yes, have a working replacement - there is *nothing* better than having another machine sitting next to your server that can take over its job with the flick of a switch while you repair it - it also lets you test changes safely, and whenever you're sure the system is how you want it, you push the same image to your "copy of" server).

    If you do it properly, that machine will then stay up until hardware failure. Sometimes that *can* be years away. If you do it properly, you shouldn't ever, ever, ever be rebooting a server that's in production - you're just masking the real problem. Yeah, it'll work most of the time but it's just a way of papering over the cracks. The server hung, the service died, the settings got out of sync, or whatever, for a reason. Just rebooting is ignoring that reason for sake of service continuance - if the service is that vital, you should have high enough availability to cover such incidences or that same problem will come back to bite you later.

    Nobody cares about enormous uptimes, but having a server that you haven't NEEDED to touch in months is a good thing. It means that it has a well-defined function and has been performing correctly - that's your "stable" version and should be treated as such. Every time you make a change to a server, it then becomes a "current/experimental" version that you should be wary of.

    At worst, when a problem appears, you turn ON a replacement server and fix the one that is showing problems. If its role is well-specified, you don't get "feature creep" where it's running a million things that it never used to and they're not in your startup properly because it's never rebooted enough for you to test them.

    On Windows, or Unix, you shouldn't have to reboot. If you do, it's to test something or correctly reinitialise after fixing a problem (a post-solution reboot just to make sure it works as required isn't a bad thing but certainly not "required"). The worry of hardware failure on boot shouldn't stop you rebooting, and similarly you shouldn't reboot just to "spot" problems. Both suggest inattention and lack of suitable backups/replacements/high availability solutions.

    Systems can easily go 3-4 years in operation without requiring a reboot. If your hardware is good quality, you're monitoring the server as you should be, you have adequate backups/replacements and the role it performs isn't changed, there's no need to ever reboot it past initial testing. I have internal school servers that only get rebooted in the summer (i.e. once per annum) and that's only because the power goes off to upgrade the electrics each year.

    If it wasn't for that, I'd just leave them running. They don't need kernel 2.6.192830921830 and they have been doing that same job reliably for a LONG time. I'm not going to kick them into a reboot "just because". Similarly even the tiniest memory leak in their processes would cause me problems that I would spot immediately.

    As it is, 450 happy users all day long for years. The last one I installed actually took a whack from a collapsed networking cabinet coming off the wall (full of fully-populated Gigabit switches) and dropping six feet onto it. Apart from a small dent it carried on just fine, and the disks were idle, and SMART / data integrity show no problems. I rebuilt the entire network cabling around it because switching it off wasn't necessary. If it did reboot and it didn't come up in the expected state? There's a copy of it on another machine on the other side of the room - it's predecessor that also didn't reboot for years but wasn't fast enough to run the amount of PHP / MySQL we needed it to among its other functions. Having the replacement machine

    1. Re:Eh? by Anonymous Coward · · Score: 0

      I assume you're magically including kernel security updates in this never-rebooted server?

      Security hole? In outdated Linux kernels? It's more likely than you think.

    2. Re:Eh? by Anonymous Coward · · Score: 0

      > I rebuilt the entire network cabling around it because switching it off wasn't necessary.

      Cool. So you complicate your network cabling just to avoid a reboot. Where do you work because I want to make sure I never work there too!

      > There's a copy of it on another machine on the other side of the room - it's predecessor that also didn't reboot for years but wasn't fast enough

      Great backup!

      FFS ... rebooting as the first form of problem solving is wrong ... but not rebooting just because you can work round it is equally retarded. I work with systems that are all on 5 9's contracts. .. 99.999% uptime. We have scheduled downtime 2- 4 times a year where we can apply those patches, reroute the network cabling, service the UPS etc. We also have regular DR tests were we power down parts of the clusters and make sure that the rest takes over. This is how you admin boxes correctly!

    3. Re:Eh? by Anonymous Coward · · Score: 0

      It's not magic, it's called ksplice.

  32. The article is almost an ad ... for Windows. by Jahf · · Score: 1

    "If you shrug and reboot the box after looking around for a few minutes, you may have missed the fact that a junior admin inadvertently deleted /boot and some portions of /etc and /usr/lib64 due to a runaway script they were writing. That's what was causing the segfaults and the wonky behavior. But since you rebooted the server without digging into the problem, you've made it much worse, and you'll soon boot a rescue image -- with all kinds of ponderous work awaiting you -- while a production server is down."

    That argument is somehow pro-Unix?

    I mean, yeah, a Windows person can screw with boot files, too. However if a Windows person were to read that paragraph it certainly wouldn't do a thing to encourage them on the solidity of *nix. It basically translates to "if you're having a problem, don't restart, you may not be able to boot again because your other admins may be incapable of writing proper scripts since every *nix system is different on its boot structure ... so ALWAYS do a full check of the existence of your system binaries before rebooting."

    Do Windows folks reboot too easily before examining logs, restarting services, etc? Sure. But this article extrapolates this point beyond the deep end.

    --
    It is more productive to voice thoughtful opinions (reply) than to judge (moderate) others.
    1. Re:The article is almost an ad ... for Windows. by Enry · · Score: 1

      If course, if the junior admin were using sudo (as TFA said previously is a Bad Thing(tm)), then you might have been able to see in the logs what commands were run and how they affected the system before restarting.

      Then again, if /boot and /etc are messed up that badly, you may as well give up and reinstall.

      Fortunately, almost all of our production servers use SystemImager, so reimaging a server from bare metal is just barely longer than the time it takes to reboot.

  33. Re:All too true by Shados · · Score: 1

    Actually it isn't. There's virtually always a reason why something screws up, regardless of if you're in Windows or Unix, and you won't need to reboot. The only exception is for patches, where Windows requires it a bit too often for comfort.

    I've worked for a few companies where rebooting a Windows Server for anything except patches/maintenance would require a full root cause analysis, and it pretty much never happened. We virtually always were able to find what was going wrong and fix it without rebooting. This isn't 1998 anymore: Windows Server absolutely can stay up for long periods of time, and there's always ways to prevent reboots.

  34. Additionally by Vekseid · · Score: 1

    If you need to increase the number of hugepages on a server, and memory is already seriously fragmented, doing that without a reboot is asking for a world of pain.

  35. An Appropriate Hacker Koan by idontgno · · Score: 4, Funny

    courtesy of Appendix A of the Jargon File.

    Tom Knight and the Lisp Machine

    A novice was trying to fix a broken Lisp machine by turning the power off and on.

    Knight, seeing what the student was doing, spoke sternly: "You cannot fix a machine by just power-cycling it with no understanding of what is going wrong."

    Knight turned the machine off and on.

    The machine worked.

    --
    Welcome to the Panopticon. Used to be a prison, now it's your home.
  36. Please stop trolling by WaffleMonster · · Score: 1
    We have racks of windows and linux servers and they just work..0 OS specific problems ever..I mean for at least 5 years.

    Are the people complaining about windows crashing running consumer hardware without ECC memory and crappy lowest bidder 1U non-redundant PSUs going crowbar? Reliability thresholds of major general purpose OSs is all noise compared to physical hardware attributes nowadays.

    It is NEVER a good idea to reboot ANY system of any kind without first understanding what the hell is wrong with it. Various services especially database service if arbitrarily rebooted may take hours (As in OFFLINE) to recover to a consistant state.. I've seen it happen..too many times....it is not a lot of fun telling retarded sysadmins they have no choice but to sit on their hands and wait hours for a system to come back online because they got up and pressed the big red button.

  37. So what. 3650 days of uptime, who cares? by Anonymous Coward · · Score: 3, Interesting

    It makes a nice figure. Ten years. HP-UX running a few more or less referential databases. 3650 days. Was it patched properly? Did anyone *really* look after it? The only thing that can be said, is that it apparently was quite a stable machine room in terms of 10 full years of electrical & other provisions, more or less intact.

    Then it was shut down for good.

    I'd rather see regular maintenance breaks and maintenance windows (pun not entirely intended), than collect numbers in the uptime command's output. But the story is true, after I left that company not a single soul ever rebooted it. Ten years after they send me an email, with an attachment of a putty session. Ten years, :)

  38. One case where a reboot is appropriate... by Anonymous Coward · · Score: 0

    I know of one case where a reboot is certainly appropriate. Suppose that you've found and fixed a problem but the fix involved changes to processes that occur at boot time. In that situation, it's a smart idea to reboot the server at some convenient time (this doesn't have to be done immediately) to make sure that the machine will reboot correctly and the problem will stay fixed. If you don't do this, you might be surprised the next time your machine comes back up after (e.g.) a power outage.

  39. Really? by Anonymous Coward · · Score: 0

    So.... It's so easy to mess up a Unix server to get it to the point where it won't boot properly, that they recommend never rebooting it?

    I'd rather have a server that I know will consistently boot properly and immediately start working, despite needing to reboot every few months than have a server that always works, but am scared to death to touch it because someone can so easily corrupt something. No matter how much planning and redundancy you have, you will eventually have a failure. When that happens, your users aren't going to be standing around praising the joys of a UNIX/Linux server while you are trying to figure out how to get the thing back online.

  40. Summary by fermion · · Score: 1
    On *nix boxes, Admins are expected to clever enough to actually fix problems so they do not happen again. The structure of *nix makes such fixes feasible, often without a reboot.

    On Windows bosex, the Admin are not allowed to fix problems, so problems persist. To temporarily solve the problem until MS fixes the code robot the computer. It doesn't matter because the computer is going to have be rebooted anyway when MS issues an update.

    To be fair, the main reason I don't reboot my *nix partitions is that I am never sure they will come back up. Say what you will, the nice thing about Windows is that no matter how it is damaged and how bad the situation is, it will attempt to come back up in something resembling a working state. Probably not a known state. Probably not a secure state. But usually a workable state.

    --
    "She's a scientist and a lesbian. She's not going to let it slide." Orphan Black
    1. Re:Summary by ledow · · Score: 1

      "Say what you will, the nice thing about Windows is that no matter how it is damaged and how bad the situation is, it will attempt to come back up in something resembling a working state. Probably not a known state. Probably not a secure state. But usually a workable state."

      Ha ha ha ha ha ha ha ha.

      There speaks a person who's never rebooted a Windows server to have it suddenly decide to wipe out its registry, its AD, etc. and then throw an absolute fit. UNIX-alikes either do something, or tell you why they can't. At worst you get hideous warnings and a chance to fix the system.

      Windows has a tendancy to mask the real problem whether that be an unfinished write (and thus a boot off a CD into recovery console to run a CHKDSK util which seems to just destroy whatever it feels like until it thinks order is restored and then reboot-and-cross-your-fingers to hope it wasn't anything too vital - because you won't ever know what it decided to wipe out),

      I've seen Windows refuse to boot because a Java certificate inside a well-known UPS manufacturer's monitoring utility expired and it threw all the toys out of the pram - Windows honestly could not boot because of that problem and the server on the UPS totally destroyed the point of HAVING a UPS in the first place. In Linux? At worst your load average would go throw the roof and you could kill the process. In Windows, you couldn't even get to a logon dialog or start any remote services.

      Windows doesn't take "damage" at all well - it boots or doesn't and if it does boot not everything will start (I routinely carry MS's WMI diagnostics utility because that service has a tendency to fall over on every Windows server I've ever seen and take lots of things with it (e.g. the ability to backup!)). And what *doesn't* start, you won't know about unless you check *everything* religiously.

      UNIX-alikes log everything and throw fits if prerequisites are broken, in hugely verbose messages.

      And don't even *breathe* when you see a Windows server "starting network connections".

    2. Re:Summary by NotSanguine · · Score: 1
      Amen to that. I've seen so many Windows servers corrupt themselves so badly as to make them unbootable.

      Then again, there was a time when Sun boxes would crash if '/' was full too.

      That said, it seems that Paul Venezia (the author of the waste of time known as TFA) is writing for middle schoolers (or managers). Either way he (and his pedantic rantings for the uninformed) shouldn't be posted here. Just read the comments from this thread and you'll get more useful information than this guy has ever written. Perhaps we should get Jackass--err--Mr. Venezia to just steal from /. posts. At least that way he might write something worthwhile.

      sigh!

      --
      No, no, you're not thinking; you're just being logical. --Niels Bohr
    3. Re:Summary by uninformedLuddite · · Score: 1

      You know what truly amazes me about your comment?
      It's the fact that it is arse backwards to reality and yet it hasn't been modded +5 Insightful. Maybe schools not out yet?
      Maybe if I give it a few more hours.

      --
      The new right fascists are bilingual. They speak English and Bullshit.
  41. rostomi by Anonymous Coward · · Score: 0

    www.webpages.ge Site catalog

  42. We reboot every 6 months by Iphtashu+Fitz · · Score: 1

    I work in an environment of literally hundreds of linux & Solaris systems, and we reboot pretty much every single one every 6 months on a regular scheduled patching cycle. We have systems broken down into groups of test/dev, staging, and production. When it's time for a new patching run we obtain all the vendor patches up to that point and apply them to the test/dev systems. After giving people a week to test & verify then we apply them to staging, and then a week or two after that during defined maintenance windows we apply them to production. If we encounter any problems along the way we address them. If we had a major problem arise along the way we'd push out the production patching until we had everything resolved on the staging systems. This method has worked for close to 10 years at this organization and we have no intention of changing it. We control access to the production systems, and all configurations are backed up & managed via a combination of homebrew code and cfengine, along with nightly tape backups of all the production systems. If any significant problems occur on production systems then it's not all that difficult to rebuild a machine from bare metal using the saved configurations and restoring any other data from the backups.

  43. Re:All too true by E-Rock · · Score: 1

    It isn't a valid strategy for Windows Servers either. Rebooting is a lazy fix for any platform. It *may* make the problem go away, but you have no clue why. Often you can find the crappy program that's causing the problem and kill it or fix it and no reboot is necessary. It takes longer, but only once. Restarting a service is way faster than rebooting and you don't have this mystery problem hanging over the box.

    I have more than 100 Windows Severs under management and other than patches we don't reboot them. We have some poorly written software that we script to shut down every night, but we're restarting that bad app, not the box.

    As linux starts to pick up steam, the same crappy admins who make windows look worse than it is are bringing their poor skills into the *nix realm.

  44. Not a myth I've heard... by asdf7890 · · Score: 1

    Not a myth I've heard uttered by people with real unix(-a-like) experience. If a service is not functioning correctly you might restart that service, and maybe its dependencies, but not the whole machine.

    The only time a server should be rebooted is after a kernel update or after configurations changes that you "know" are right but need to verify stay right after a reboot. I do sometimes reboot machines at other times just to make sure all is well so I can be reasonably assured that everything will come back up after, say, a power outage. None of these times happen when there is a known problem to be investigated - the reboot happens at a planned time (well, within a planned window) outside of working/demand hours. Rebooting a machine to fix a problem is no better then "close all your windows and see if it happens again" guesswork.

    Some suggest rebooting to force an fsck occasionally, to ensure the filesystems are in consistent order, but this can usually be done without a full reboot (unless you suspect the root filesystem may need checking) - just stop all the relevant services and umount the filesystem.

    1. Re:Not a myth I've heard... by BitZtream · · Score: 1

      Some suggest rebooting to force an fsck occasionally, to ensure the filesystems are in consistent order

      Someone who doesn't trust the kernel and fs to be sane under normal operations is probably using the wrong kernel/OS or file system, they just may not be to the point where they realize it yet.

      If you think you need to run fsck on a clean FS, you need to realize you aren't an admin, just a guy with root.

      --
      Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
    2. Re:Not a myth I've heard... by asdf7890 · · Score: 1

      Even though I trust the kernel (except brand new features: I let the early adopters get scalped there) and the protection I have setup (everything of consequence RAIDed and in regularly tested backups online/onsite+offsite+offline), hardware errors still happen and bugs are still going to be present. Call it being overly paranoid if you will but I'd rather check everything is OK every now and then than just trust that it is. The fact that most setups default to running a fill fsck on start every X days (even with no unclean umounts) implies I'm not the only one who thinks this isn't a bad idea.

      As for "just a guy with root": I'd not go that far, but I'm perfectly happy to admit I'm far from an expert admin. I do consider myself to have more knowledge+experience than many people who do claim to be experts though!

  45. Memory fragmentation by redelm · · Score: 1

    Even if you don't have power accidents or hung hardware, it is probably a good idea to reboot Unix and other Linux-like boxen every so often. To clear out memory fragmentation and (horrors!) kernel memory leaks and stale kmallocs().

    How often is a good question. Perhaps yearly on a lighly loaded box (I like doing it in Dec so `ps` shows me the year). Maybe monthly on a loaded box. I have noticed a speed-up.

    1. Re:Memory fragmentation by Anonymous Coward · · Score: 0

      Even if you don't have power accidents or hung hardware, it is probably a good idea to reboot Unix and other Linux-like boxen every so often. To clear out memory fragmentation and (horrors!) kernel memory leaks and stale kmallocs().

      How often is a good question. Perhaps yearly on a lighly loaded box (I like doing it in Dec so `ps` shows me the year). Maybe monthly on a loaded box. I have noticed a speed-up.

      This post: make up for having a real argument by tossing in some jargon and set it in a conversational tone. Memory fragmentation? Are you _serious_? Fail.

    2. Re:Memory fragmentation by BitZtream · · Score: 1

      To clear out memory fragmentation and (horrors!) kernel memory leaks and stale kmallocs().

      Again, 1995 called, they want their broken ass kernel back. Memory fragmentation is simple to fix at runtime on pretty much any hardware platform with an MMU ... so everything Linux runs on.

      Kernel memory leaks and what have you indicate you've picked a broken OS to use, not that reboots are needed.

      Are you sure you aren't talking about Windows? I switched to FreeBSD from Linux years ago but I find it hard to believe you're running into memory issues due to fragmentation or kernel leaks now days. People use it in production, it can't be that bad ... RIGHT?

      --
      Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
    3. Re:Memory fragmentation by redelm · · Score: 1
      I haven't made myself clear: Yes, when big blocks aren't available, the kernel will stitch together scattered pages in whichever Discriptor Table to make whatever malloc() is required. The MMU will make this transparent and nothing should crash. But it won't necessarily be quick -- 1 MB takes 256 4kB pages.

      On some frequency, I like to bring a box down to minimum processes, clear /tmp, run fsck's, drop_caches, and restart procs. The proof is in `free` and SsyRq-M. Linux/*BSD/Unix are _much_ lower maintenence than MS-Windows, but not _zero_.

  46. bullshit by Sloppy · · Score: 1

    He cites as one reason, that someone might have deleted stuff in /boot or /etc or /usr/lib64 so the machine might not come back up quite right. I would sure as hell rather be confronted by such surpises during scheduled downtime, rather than after a power outage when the UPS failed and I already have enough problems. Problems like what he's talking about, while they kind of presume you have already failed as an admin (why are clueless people deleting things in /boot?) are a reason you should reboot. The sooner I know about it, the better, especially if it's at a less panicky time like on a weekend when nobody cares if I have to spend an extra hour with a rescue CD.

    Also, sometimes it's just plain convenient. If the reason you're working on the box is that you just moved your /var to a new device (maybe you were changing the kind of filesystem you used, or wanted to expand it but are using a filesystem that isn't easy to resize (though I can't think of one right now)), good fucking luck unmounting it so that you can remount it on its new device. Why go through the hassle when you can just reboot and have things magically work?

    And then there's kernel updates. You're not going to upgrade from 2.6.n to 2.6.(n+1) by reloading modules. And yes, I have heard of some hacks for people updating kernels without rebooting, but if you think about how these things work, they're a hell of a lot scarier than rebooting is.

    --
    As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
    1. Re:bullshit by BitZtream · · Score: 1

      (why are clueless people deleting things in /boot?)

      Better question, why the fuck wasn't / (and /boot) mounted read only in the first place?

      You're not going to upgrade from 2.6.n to 2.6.(n+1) by reloading modules.

      No, you wouldn't do that with Linux, Linux has absolutely 0 concern about binary compatibility, it is in fact almost something they strive to avoid.

      However, you can do it pretty easy on a lot of commercial unixes and on the *BSDs as well in most cases. A properly designed kernel shouldn't be changing that often in ways that are going to break modules unless the kernel authurs don't give a flying fuck about anyone elses work, hence why you wouldn't do it with Linux but on Solaris, which they maintain binary compat specifically for this sort of thing, well ... that would be an entirely different ball game.

      Same for Windows for that matter, although most people dont' have any clue that its possible.

      --
      Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
  47. But the Dinosaurs are getting in! by Anonymous Coward · · Score: 0

    I must reboot this unix server now.

  48. Re:All too true by mdsharpe · · Score: 1

    My several years experience with numerous Server 2008 (yes that's the same kernel as Vista) and, more recently, 2008 R2 based boxes suggests otherwise. Perhaps your servers are badly configured? Windows Server is a rock solid OS.

  49. Weekly reboots by msobkow · · Score: 1

    I reboot all my systems once a week (Sunday). Better to find out about messed up init scripts and such in a timely fashion that while everyone is screaming about unplanned downtime. And I have found and fixed such errors from time to time, so it's not a bad idea to get the issues resolved before they become a problem.

    --
    I do not fail; I succeed at finding out what does not work.
    1. Re:Weekly reboots by bobstreo · · Score: 1

      Must suck for all your clients/app developers/customers in China, India, Japan and Australia.

  50. What the ... ? by X.25 · · Score: 1

    In 20+ years of working in UNIX environments, I've never met anyone who would reboot a UNIX box to 'fix' a problem or 'clean up' anything.

    Maybe next time you should post an article about something a monkey typed on a computer. Would probably be worth reading more.

  51. /tmp by bussdriver · · Score: 1

    Some setups do not clean up the /tmp directory - mine seems to just fill up forever until I reboot. maybe it does something on its own but I've never checked or ran into a situation where i needed to find out; an update or paranoid reboot deals with it.

    Memory fragmentation? people care about that with their 10Gig of RAM still? Well, to be fair performance has been coming back with smart phones and battery life on laptops ...now if we could just get people back to knowing what a pointer is...

    1. Re:/tmp by Yvan256 · · Score: 1

      now if we could just get people back to knowing what a pointer is...

      Hey, we do know what a pointer is! ... you're talking about those laser pointers, right?

    2. Re:/tmp by redelm · · Score: 1

      Manually cleaning up /tmp is fairly easy. Memory fragmentation is still an issue because old blocks can break up big blocks such that large blocks are no longer available. Try the Magic SysReqest - M .

  52. Rebooting is often required for Security Patches. by Anonymous Coward · · Score: 0

    Rebooting isn't a crime--and is often necessary after applying security patches.

    Think about that the next time someone tells you it'd been 700 days since a reboot; 700 days of exploits you can choose from to assail that machine with.

  53. Obvious note and question by miruku · · Score: 1

    "The "my servers been up 3 years" e-pene days are gone folks."

    Slashdot Stats
    uptime: 1021 days, 23:23

    In what kind of instances might a test reboot be advisable after a change? Obviously our overlords have not felt the need to follow this particular advice..

    --
    MilkMiruku
    1. Re:Obvious note and question by sznupi · · Score: 1

      1 kibiday, here we come!

      --
      One that hath name thou can not otter
  54. ToughGuys-r-us by trollertron3000 · · Score: 1

    If a server going down is a big deal then perhaps you aren't as good as you think. Just a thought..

    Seriously, if your setup isn't virtual save the preaching because you're stuck in the past.

    --
    Tiger Blooded Bi-Winning Machine
  55. Rebooting is a "way of life" for Windows admins? by Anonymous Coward · · Score: 0

    Mr. Venezia must know a lot of piss-poor Windows admins. Or be one himself.

  56. Troll? Strawman? by Torodung · · Score: 1

    What is up with folks tagging this as a "troll?" Strawman? Really? There isn't a debate here, guys. It is not rhetoric. There is no strawman.

    This article is sound advice. He's basically saying don't do a reboot without doing an RCA first. If you need to reboot as part of your root cause analysis, fine, but for god sakes don't just shut down everything until you know why, that you need to, and whether it's going to come back up.

    This is good advice for Windows sysadmins too. Period. But Windows was not its focus. *nix was. This is because *nix is a bit more compliant to letting root completely hose the file system, on the fly (his "runaway script" example), and still be able to run.

    IMO, it was written as a caution to new *nix admins, possibly migrating from a Windows environment, period. RTFA, and take it at face value. You'll learn something if you're a novice to *nix.

    --
    Toro

    1. Re:Troll? Strawman? by trollertron3000 · · Score: 1

      Well it actually is a straw man because he sets up a case, the junior admin screwing up the machine, as a reason to avoid rebooting. That by definition is a straw man.Trust me, I'm a trolling expert and I know a troll when I see one. He's trolling for page views.

      --
      Tiger Blooded Bi-Winning Machine
  57. Ya, I have never heard of this by shovas · · Score: 1

    You might argue the "myth" of rebooting windows servers but I've never heard anyone with sufficient *nix experience saying that rebooting is a solution to anything except certain cases.

    --
    Selah.ca. Pause, and calmly think on that.
    1. Re:Ya, I have never heard of this by man_of_mr_e · · Score: 1

      If you're running a 24x7 server that can't afford to be down, then you also can't afford to take the time to troubleshoot problems if a reboot will make it start working again.

  58. SMART is *highly* overrated by v1 · · Score: 1

    so boot off a SAN, or learn how to read SMART messages.

    Tools that run periodic performance tests and continuous surface crawls will make a fool out of SMART every single time.

    (I have a great deal of experience on this front, and three boxes sitting behind me full of about 300 drives in various states of failure. do NOT rely on SMART)

    --
    I work for the Department of Redundancy Department.
    1. Re:SMART is *highly* overrated by sznupi · · Score: 1

      Tools that run periodic performance tests and continuous surface crawls will make a fool out of SMART every single time.

      Tools, such as?...

      --
      One that hath name thou can not otter
    2. Re:SMART is *highly* overrated by v1 · · Score: 1

      Tools, such as?...

      Mine's called "WatchDrives2". Let me know if you'd like to play with it. Over 5500 lines of a bash cron job. yes really, it's quite sophisticated and has a user interface. And it works, that's what matters, right? It even emails reports and warnings.

      So far in the past it's identified six of my drives in the very early stages of failing, and I have yet to lose data under its watch. (as in, not had to go to my backups) Two of them flipped their smart status while I was recovering data from them during replacement, (gee thanks for the notice...) and a third outright wouldn't spin up again after I finished my backup and powered it down to remove it from the case to test it further. (and that one's SMART was 'pass' the entire time) Makes you want to shout "HAH, I WIN!" as you throw the drive across the room into the trashcan.

      I don't know of anyone else making a tool that does anything like it. I've been told by several people that use it that I should rewrite it and sell it, but that's just not my interest I guess, I don't sell my software. You're welcome to try yourself. I'm sure eventually someone will.

      --
      I work for the Department of Redundancy Department.
    3. Re:SMART is *highly* overrated by sznupi · · Score: 1

      Ah, homebrew... I wonder a bit how the triggers are set, if they're at "pretty much anything except nominal" (oh well, too late question anyway; just a slow morning here, going through mailbox)

      --
      One that hath name thou can not otter
  59. Another Linux admin with a superiority complex. by nuckfuts · · Score: 1, Interesting

    Windoze admins...

    The very first word in your "+5 Informative" diatribe is a derogatory term blanketing all administrators of Windows systems. Anything else you have to say should now be taken as extremely biased, if not plain ignorant. I've been an administrator of Unix systems for over 20 years, and an administrator of Linux and Windows servers since their early days. Being a Windows admin does not mean that one is uniformed or technically inept, any more than being a *nix admin makes one smarter.

    - require https over http to devices, yet still have telnet access enabled.

    I'm sure I have several devices on my network with telnet enabled. Why should I bother disabling it? I don't use it, so its vulnerability to password sniffing is irrelevant.

    And what do any of your gripes have to do with whether or not Unix servers should be rebooted?

    1. Re:Another Linux admin with a superiority complex. by Anonymous Coward · · Score: 0

      I'm sure I have several devices on my network with telnet enabled. Why should I bother disabling it? I don't use it, so its vulnerability to password sniffing is irrelevant.

      Anyone else have a chance to get to the devices with telnet. How about a brute force dictionary attack. Or is this network in your basement. 8-)

    2. Re:Another Linux admin with a superiority complex. by Anonymous Coward · · Score: 0

      I'm sure I have several devices on my network with telnet enabled. Why should I bother disabling it? I don't use it, so its vulnerability to password sniffing is irrelevant.

      What if the telnet daemon itself is vulnerable?
      Rule of Thumb: Don't have unnecessary daemons running.

    3. Re:Another Linux admin with a superiority complex. by Anonymous Coward · · Score: 0

      > Being a Windows admin does not mean that one is uniformed or technically inept

      Yes it does, nahaa. :-P

    4. Re:Another Linux admin with a superiority complex. by Crackez · · Score: 1

      Windoze admins...

      The very first word in your "+5 Informative" diatribe is a derogatory term blanketing all administrators of Windows systems. Anything else you have to say should now be taken as extremely biased, if not plain ignorant. I've been an administrator of Unix systems for over 20 years, and an administrator of Linux and Windows servers since their early days. Being a Windows admin does not mean that one is uniformed or technically inept, any more than being a *nix admin makes one smarter.

      Stereotypes exist for a reason. If it wasn't true for at least a statistically relevant number of samples, then the stereotype would not exist. Also, some things are derogatory on purpose, as they should be. Who cares about windoze admins anyway, it's not like they're *real* people... You might as well try to have a conversation with a retard. Sorry for any offense to actual retards out there...

    5. Re:Another Linux admin with a superiority complex. by Anonymous Coward · · Score: 0

      I don't use it

      said the sophomore admin that works alone. It's an issue when another admin uses telnet or does something else that's in a similar state. systems should almost be self describing or follow a standard pattern.

    6. Re:Another Linux admin with a superiority complex. by joeyblades · · Score: 1

      I work at a large engineering company where nearly everyone is a Windows user and about 1/3 are also *nix users. I don't know about how well informed my Windows admins are. I do know that the first thing out of their mouth no matter what my problem is: "Have you tried rebooting?" If you answer "no", either they will ask you to reboot or they will reboot for you remotely.

      You will never, ever hear this question (or the corresponding recommendation if you answer in the negative) from one of our *nix admins. Rebooting is always the last option in the *nix world.

      I don't think that the issue is the intelligence of the admins. I think the issue is the amount of variability in the Windows environment. Despite the company's attempts to moderate it, many users have a lot of crap on their Windows machines. Rebooting probably evolved to be the default action when Windows machines misbehave... which is quite frequently, relative to *nix machines. Also, rebooting a Windows box rarely affects anyone except the user that is having the problem - it's a relatively benign strategy.

    7. Re:Another Linux admin with a superiority complex. by The+Moof · · Score: 3, Informative

      Why should I bother disabling it?

      Generally, good administrators tend to disable service that aren't wanted or needed in their systems. Who's to say that there's not going to be a vulnerability for the service discovered down the road (*coughSolariscough*) that would make you vulnerable?

    8. Re:Another Linux admin with a superiority complex. by Anonymous Coward · · Score: 0

      Windoze admins...

      The very first word in your "+5 Informative" diatribe is a derogatory term blanketing all administrators of Windows systems.

      Waaah fucking waah. It's a toy OS for toy computers.

      Anything else you have to say should now be taken as extremely biased, if not plain ignorant. I've been an administrator of Unix systems for over 20 years, and an administrator of Linux and Windows servers since their early days. Being a Windows admin does not mean that one is uniformed or technically inept, any more than being a *nix admin makes one smarter.

      ...

      Correlation not being causation does NOT mean that correlation does not exist...

    9. Re:Another Linux admin with a superiority complex. by Anonymous Coward · · Score: 0

      Having a service open that you don't use exposes a needless security risk

    10. Re:Another Linux admin with a superiority complex. by Anonymous Coward · · Score: 0

      Being a Windows admin does not mean that one is uniformed or technically inept, any more than being a *nix admin makes one smarter.

      Certainly - but I would be surprised if there were not a correlation.

    11. Re:Another Linux admin with a superiority complex. by wasabii · · Score: 1

      Oh. I run all Windows servers. Because they're, you know, better for some stuff.

    12. Re:Another Linux admin with a superiority complex. by FormOfActionBanana · · Score: 1

      Telnet runs suid root, so if it has any exploitable buffer overflows, they could allow remote root access.

      --
      Take off every 'sig' !!
    13. Re:Another Linux admin with a superiority complex. by phillips321 · · Score: 1

      And boom there it is, if you don't need it why is it running?
      More cpu cycles consumed.
      More memory consumed
      If a remote code execution vulnerability is released for that version of telnet then wham....
      .....these are the reason that you would not be considered a good admin!

    14. Re:Another Linux admin with a superiority complex. by Anonymous Coward · · Score: 0

      Why leave a remote service that is unused and a potential risk, installed and running? It is one more thing that needs to be maintained. A flaw in Telnet, which is used for remote control of a machine, could be critical. Everything I have ever heard, and deduced myself indicates minimalism in software running on the server is important for stability and troubleshooting.

      Also he immediately backed his biases up with examples of common (and not so common) real world behaviors. I would be surprised if the windows admins near me could even telnet to begin with. Many people get the title of "windows administrator" because they are the person who can follow on screen prompts from an installation wizard the best.

    15. Re:Another Linux admin with a superiority complex. by djp928 · · Score: 2

      Windoze admins...

      The very first word in your "+5 Informative" diatribe is a derogatory term blanketing all administrators of Windows systems. Anything else you have to say should now be taken as extremely biased, if not plain ignorant. I've been an administrator of Unix systems for over 20 years, and an administrator of Linux and Windows servers since their early days. Being a Windows admin does not mean that one is uniformed or technically inept, any more than being a *nix admin makes one smarter.

      Stereotypes exist for a reason. If it wasn't true for at least a statistically relevant number of samples, then the stereotype would not exist.

      Yeah, those damn lazy black people, always raping our precious white women.

    16. Re:Another Linux admin with a superiority complex. by siglercm · · Score: 1

      Windoze admins...

      The very first word in your "+5 Informative" diatribe is a derogatory term blanketing all administrators of Windows systems. Anything else you have to say should now be taken as extremely biased, if not plain ignorant. I've been an administrator of Unix systems for over 20 years, and an administrator of Linux and Windows servers since their early days. Being a Windows admin does not mean that one is uniformed or technically inept, any more than being a *nix admin makes one smarter.

      Stereotypes exist for a reason. If it wasn't true for at least a statistically relevant number of samples, then the stereotype would not exist.

      Yeah, those damn lazy black people, always raping our precious white women.

      "What is, 'Hey, where the white women at?' I'll take Racist Movie Quotes for $600, please, Alex."

      --
      sigfault (core dumped)
    17. Re:Another Linux admin with a superiority complex. by nuckfuts · · Score: 1

      You'll notice I said "on my network". I'm not advocating this stance for everyone.

    18. Re:Another Linux admin with a superiority complex. by nuckfuts · · Score: 1

      Generally, good administrators tend to disable service that aren't wanted or needed in their systems. Who's to say that there's not going to be a vulnerability for the service discovered down the road...

      Can't argue with that. It's just that on my own network, I get to choose my own balance of laziness vs. paranoia.

    19. Re:Another Linux admin with a superiority complex. by Crackez · · Score: 1

      Yeah, those damn lazy black people, always raping our precious white women.

      Yeah, but at least I didn't compare Windoze admins to Black people. You are not allowed to offend black people you know; that's how you get stabbed.

    20. Re:Another Linux admin with a superiority complex. by ibbie · · Score: 1

      - require https over http to devices, yet still have telnet access enabled.

      I'm sure I have several devices on my network with telnet enabled. Why should I bother disabling it? I don't use it, so its vulnerability to password sniffing is irrelevant.

      I'm curious as to why you wouldn't disable telnet on those devices, if you don't use it.

      --
      The wise follow a damned path, for to know is to be forsaken.
  60. time to stop reading /. by Anonymous Coward · · Score: 0

    Ok, if an article like this is making the front page of slashdot, then the reader base must have shifted a lot more than I had previously realized. Duh, reboots are for upgrades -- this has been established a very long time ago. I mean this isn't even a holy war issue which you expect to have flare up every now and then. If the article were about guidance on upgrades that would be different. The whole "we do a monthly reboot" regardless is silly; I just don't see it, wouldn't you be better served figuring out how to make the delicate parts of the root partition read only?
    I mean really, if you are unsure that your system will reboot after you make some change, then why in the world are you making that change on a production server?

  61. *nix is not a perfect system, but it's close by bl8n8r · · Score: 1

    Uptime numbers are just another stupid dick-measuring contest. Resources consume ram and some resources leak memory. You can't reclaim that leaked ram without a reboot.

    Kernel updates and some system patches take reboots too. If you're not rebooting, you're not current.

    You cannot fsck a mounted slash. fsck always finds something wrong on the filesystems when I *do* reboot after a year of uptime.

    This business of rebooting a server everytime there is an issue is a sign of a larger problem. There are more options available for fixing problems on *nix systems, that's just the way it is. You don't always *need* a reboot to fix things, but sometimes you do (stuck tape drives, zombie processes, crappy iscsi software, etc).

    --
    boycott slashdot February 10th - 17th check out: altSlashdot.org
    1. Re:*nix is not a perfect system, but it's close by BitZtream · · Score: 1

      Uptime numbers are just another stupid dick-measuring contest.

      True, but they are an indication of an admins ability to keep the machine functioning as well. Gold medals in the Olympics are a dick measurement as well, stop being a bitch about it just because yours is small.

      Kernel updates and some system patches take reboots too.

      Kernel patches need a reboot, full stop. System patches do not, a libc update would require pretty much all your services restarted to take advantage of it, but do they need to? If you install a patch to libc to fix a timezone issue for Botswana and you'll never deal with that timezone ... and then you reboot your box because of it, you're just a fucking idiot, not a good admin. You introduced a change to a stable system for no reason, horrible horrible engineering practice.

      If you're not rebooting, you're not current.

      So you're calling uptime a dick swinging contest, but you don't think this statement isn't the exact same thing? I'm not sure about whatever OSes you use, but none of the UNIX machines I admin have had a remote kernel exploit in years, everything else can be handled by applications except for some weird privilege escalation bugs, but again, I haven't seen one of those in years. Do I not have new kernel features by running older kernels? Nope, I don't, but does it MATTER if I don't have those features? I don't upgrade because someone released a patch. I upgrade because I have a reason, it results in FAR FAR less headaches when you need your machines to be reliable rather than bleeding edge.

      You're upgrading because you think 'omg must be current' which is just as ignorant as caring about uptime without considering anything else.

      You cannot fsck a mounted slash. fsck always finds something wrong on the filesystems when I *do* reboot after a year of uptime.

      The OS you are referring to is broken in multiple ways. First off, what OS can't fsck the root partition while mounted? Mine both do it by default, about 30 seconds after the boot process is more or less complete. I can certainly do it manually by hand after boot as well. If your machines are always corrupt after a years uptime with a clean filesystem then you definitely need to get your broken ass OS fixed. Don't know which one you're referring too, but its clearly broken if you're getting disk corruption during normal runtime. There is no way that it is acceptable to have a unclean root after a clean shutdown, I suggest you pick another OS or an FS on your OS that isn't broken.

      This business of rebooting a server everytime there is an issue is a sign of a larger problem. There are more options available for fixing problems on *nix systems, that's just the way it is. You don't always *need* a reboot to fix things, but sometimes you do (stuck tape drives, zombie processes, crappy iscsi software, etc).

      Zombie processes and stuck tape drives cause you to need to reboot? Seriously? Zombie processes? 1995 called, they want their Linux back ... WTF kind of tape drives are you using that can't be fully reset from the OS? Hell, power cycle them, rescan the scsi bus and move on if you don't know how to fix your tape drives. ISCSI problems? Uhm, restart the server process or force an umount if the hung machine is the client. If your kernel is broken then it would be a bigger issue, but here again, you're using a broken/buggy OS and its not something that would happen in my shop.

      To the point: Everything you've said in your post makes it clear you are in no way a professional admin. You appear to have thrown some random modern buzzwords in with what I would say was true ... if I were running Linux in 1995. Now I don't run Linux today, I use other UNIXs so I can't say that it isn't that shitty any more, but I'm pretty sure its not, and I know the *BSDs and Solaris act nothing like you describe. So what fucked up UNIX are you dealing with that makes you reboot so often? I want to make sure I get no where near it.

      --
      Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
  62. *mostly* agree by Junta · · Score: 1

    I mostly agree with the sentiment, analyze a failure in place before rebooting. In fact, for a *problem*, I agree that it is a particularly bad time to blindly reboot.

    For updates, in practice, I disagree. In theory, he is right that very very few updates *demand* a kernel replacement and all other updates *should* be possible through restarting the 'right' things. In practice, there are significant problems.

    First, some kernel modules are written.. sub optimally and cause unrecoverable issues if unloaded or will not work right on reload. Some drivers are written and only tested with a PCI reset and POST cycle between driver changes. Sometimes reloading a driver if the firmware was updated during runtime will confuse a system. Downing some modules will remove functionality that is required for the system to run well enough to load the module again. Many of these are issues that if you knew in advance would induce you to pick a different vendor, but they are usually only apparent after it is too late.

    Secondly, depending on your Unix/Linux choice, updates may not be available of 'just' the modules you need or even clarify whether the updates are in the kernel or modules. A linux distro generally dumps a single update of kernel *and* modules. Even if you *knew* the kernel didn't have critical fixes, modprobe may refuse to load new ones against other kernels. 'Real' Unix *tends* to be potentially better on the latter point, and you *might* be able to comb over the updates with a fine tooth comb, hand-patch the source for the kernel tree that matches your running kernel, build and load. However, this is a ton more work and way riskier than 'just' rebooting.

    Finally, *particularly* with shared library vulnerabilities, there is a slim chance in practice you'll understand all the processes currently executing on your system enough to be 100% confident you'll hit all in-memory instances of buggy/vulnerable code. Figuring out the 'right' processes to restart or end is generally a larger logistical challenge.

    In general, reboots should be feared in terms of disrupting identifying root cause, but updates and periodic sanity testing to make sure your system on startup actually works and matches how you expect it to be configured. If the service is critical enough to make people worry about reboot induced outage, then the service is not properly configured to run in a manner conducive to mission critical reliability (another server should be able to transparently take up load).

    --
    XML is like violence. If it doesn't solve the problem, use more.
  63. You can perform kernel updates without reboot by grilled-cheese · · Score: 1

    I don't suspect the author was aware of Ksplice. You can actually perform kernel upgrades without a reboot.

    1. Re:You can perform kernel updates without reboot by Anonymous Coward · · Score: 0

      You can also run every service on the box under root.

      Should you?

      If rebooting is painful - if something might 'break' - you're doing it wrong. And if your box will take more than a minute to reboot, you need to go to war with your superiors to replace your circa-1990 hardware. :p

  64. What about PC games? by Anonymous Coward · · Score: 0

    Ever been asked to reboot after installing a *GAME* on that other platform? This makes me ask the question: "What did you game publishers just do to my system? Are there any secret root-kits or other forms of malware that came bundled along with this game?"

  65. Replacing (persistent) myth with another myth by boorack · · Score: 1

    Why so many UNIX admins try keep servers up as long as possible ? There is a valid reason to reboot your UNIX servers on regular basis and it's not about solving problems but rather about avoiding (potential) future problems.

    While many Windows folks tend to reboot servers when there is a problem (and it seems to work for them sometimes), Linux/UNIX servers are quite contrary in this regard - these things often seem to work fine until reboot. And after reboot one often realizes that some things manually started or reconfigured while server was running does not work properly or server requires some additional (manual) work to make it work. There may be bugs in startup scripts (or missing links in rc?.d directories, or screwed dependencies), one can make some tweaks with some service without stopping it and then forget putting changes into configuration files etc. Of course, there are habits and tools to deal with it but you cannot be sure that everything works after reboots until you try it.

    I advise rebooting servers at convenient moments on regular basis or after major reconfigurations just to make sure that every change is properly synced with configuration files and startup scripts.

  66. simple: risk management by Anonymous Coward · · Score: 0

    Especially in a large company.
    One that is perhaps grown via a katamari damacy style brute force accumulation of unrelated crap.
    Unrelated crap where more likely than not, there is not a single person left in the company who knows how it works, or specifically what it does ... except that someone, somewhere depends on it ... or not, who knows? You just know you got an alarm, or a blinky light, or a dead drive, or a bad network interface, yadda yadda.

    If you reboot, you don't know for sure that it's going to come back on ... and it hasn't been booted in 20 years.
    Thankfully with Unix type OS's, if you know what you're doing, you probably don't HAVE to reboot.

    So you scrub in, and you do some surgery, and you hope for the best.
    nobody wants to be up all night dealing with that shit, and trying to find someone who even knows what the machine does.
    nobody.

    and that is why ...

  67. +600 days uptime by Dr.+Tom · · Score: 1

    So the power failed. I was happy to tell people that a system that hasn't been rebooted in over a year is a system that is BADLY in need of an upgrade. I don't want to run software that old. I'm glad the power failed and gave me an excuse.

  68. +5 Intelligent Planning by Anonymous Coward · · Score: 0

    The general consensus of disaster recovery best practice is that you do not test a backup strategy, you test a restore strategy. Rebooting a server is testing a system restore process.

    Yes exactly. When push comes to shove, you are not a deity and you cannot guarantee that your boxes will never be powered off. As such, you need to be confident that they will also power on. Like a backup/restore, that confidence isn't based in, "this should work," but in, "this does work."

    The author's argument that you should need to reboot because faulty boot scripts are the mark of a poor sysadmin is bull. Everyone makes mistakes. EVERYONE. It is the height of arrogance to presume that you are somehow special and don't need to check your work. In the case of a startup script for a server, the only real complete test that it is valid, is to reboot. Not testing your setup is not only poor IT skills, it's unscientific. If there is any remaining shred of computer science in the IT world, it should at least include testing of your hypotheses (assumptions).

  69. Reboot does fix some things by jmorris42 · · Score: 1

    Dunno, got one situation that a reboot fixes and I have looked.... and used Google, etc. So I present it as a question to Slashdot. Prove it can be solved without a reboot.

    Servers runs a RHEL3 clone. Workstations run a RHEL5 clone. My laptop runs Fedora 12. We have a couple of Tcl/Tk scripts on the servers that we display remotely to the workstations and my laptop via ssh X forwarding. Everything is happy, happy, joy, joy... except when it isn't.

    Suddenly some remote X stops working, in particular the Tk ones stop. Basic xterms and even Firefox will start over the remote link perfectly. Even better, they almost always display on my laptop, even when everyone else in the building is sticking their head in the door complaining they can't it to work, which makes it even more fun for me to troubleshoot. I have poked around. I have twiddled server tuning knobs, even ran strace on the apps. They seem to hang on the pipe to X but can't see why. Rebooting will always fix it. One of the servers will usually recover in a few minutes without a reboot if we can afford to wait it out, the other doesn't do it as often but when it happens a reboot is the only way back... and the Tk app that fails on it is our timeclock app so waiting isn't a good option. But the machine with the timeclock also serves out home directories to every staff workstation and hosts the virtual machines that run our library automation system and it bites us hard to have to shut that all down. Thankfully that machine only exhibits the problem a couple of times a year.

    --
    Democrat delenda est
  70. Short version by Anonymous Coward · · Score: 0

    Stupid guy said something stupid, and now tries desperately to justify his theology instead of admitting he overstated something.

  71. Reboots are mandatory by NetServices · · Score: 0

    ... otherwise the Windows Updates won't get properly applied. :P

  72. Prophylactic reboots? Seriously? by nblender · · Score: 1

    Do you reboot your refrigerator once a month too?

    1. Re:Prophylactic reboots? Seriously? by Anonymous Coward · · Score: 0

      nop ... but on the other side I have never seen it trash the swap ... I will plug an rj45 and sell it as a rack

  73. Lost at the End by Anonymous Coward · · Score: 0

    Many essay/article writers feel compelled to conclude with an upbeat, smiley-faced "and here's how things will be better" paragraph at the end, and often it's a little puff of fluffy made-up nonsense that doesn't connect to the rest of the story. I've done it. The New Yorker does this a lot. And Paul Venezia does it here.

    Last paragraph: "The next time you're looking at a problem and someone says, 'Hey, let's just reboot the thing,' make sure you've exhausted every other possibility before you send it to init 6. The time and pain you save will definitely be your own."

    Really? No, I think that's the whole tragedy of the commons that makes rebooting more common than debugging. It's very possible that by punting the problem downfield (get it running quickly while masking root problems) someone else will in fact wind up taking care of it. Maybe the IT staffer on another shift, maybe a future employee after you've left, maybe a different helpdesk guy after you've got the client off your personal phone line. The economics of having multiple, rotating/replaceable IT personnel make it inevitable that the time/pain will not in fact be your own, and makes actual debugging non-incented.

  74. WOL by HalAtWork · · Score: 1

    Send it the magic packet?

  75. You SHOULD reboot UNIX boxes too by Max+Romantschuk · · Score: 2

    After making configuration changes it makes _a lot_ of sense to reboot if possible. That way you can determine that your changes indeed load properly after a reboot. You don't want that kind of a surprise when you have long since forgotten all the little tweaks in place.

    --
    .: Max Romantschuk :: http://max.romantschuk.fi/
  76. Full of nothing by Anonymous Coward · · Score: 0

    Not only does he make a no point (I havent seen any admin willing and happy about reboots)

    But also tries to sell the idea of going through hoops to avoid restarting as being a good admin. If the most cautious way is to replace and restart, then you should do it. If you want to avoid the surprise (?) of a deleted /boot before rebooting, then create a checklist and keep a failover server, but stop creating more complicated and in the long run more error prone stuff just to show off.

    Availability comes not only knowing how to keep the server online but also with strategies to keep the service online! If all you got is "no reboot" as you strategy then you are screwed anyway.

    You are not going to impress girls with that kind of uptime.

  77. I rebooted once, never again. by mpcooke3 · · Score: 1

    I rebooted a Dell office linux fileserver once, just to ensure my new fstab entry mounted ok on startup. It didn't mount okay because I made an error in the fstab startup options. Although that turned out to be a more minor issue with the reboot, the much larger problem was that after rebooting the server it would no longer recognize any USB keyboard (I tried 5 different ones) - on any USB port - or via a plugin PCI card, and I couldn't work out any other way to give keyboard input without first having a normal working keyboard. I tried reseting the CMOS but that just resulted in the machine asking for an F-key to get pressed on startup.

    After talking to Dell technical support for some time it was apparent they had no way to work around the issue either (short of replacing the motherboard) and we ended up throwing out the whole server and moving the disks to a new one.

    That was a late night, and one server i really regretted rebooting!

    PS We had a leaky roof and an umbrella over the server for some time, the server had got wet previously and still had drip marks all over it, so I wouldn't really *blame* dell. ..

  78. To Reboot or Not to Reboot by Anonymous Coward · · Score: 0

    If you don't reboot your unix servers you are in for a world of hurt when you need to. Most likely a server that has been up for a few years will not reboot without a disk or multiple disk hardware issue. Some issues might be recoverable some might not. You should have a reboot schedule to minimize server hardware issues on unscheduled reboots.

    1. Re:To Reboot or Not to Reboot by Anonymous Coward · · Score: 0

      What are you implying? Things only fail on a bootup? If that was the case, wouldn't they fail during your suggested periodic rebootings as well putting you in the same "world of hurt"? Or are you suggesting that things fail when they run for a long time? If that was the case is a monthly reboot really going to reduce your overall run time and save the components?

      There is no "fail meter" on hardware that gets magically gets reset back to zero during a periodic reboot of the OS. Here's a tidbit.. The hardware was running the entire time through the reboot, just the software was "restarted". Your NIC, or HD doesn't know that you just rebooted other than maybe the bios probing something that has not been probed in a while.
      Your logic...
      Wow, I was one day away from some failed sectors on my HD but since I've rebooted the OS, those sectors won't fail.

  79. What horrifies me by David+Gerard · · Score: 1

    What horrifies me about this article is that he had to write it. He had to actually say "if you just reboot at trouble, you haven't fixed the problem." This is actually a story worth him writing and worth someone putting on Slashdot.

    Mind you, I administer Solaris for a living and rebooting is something we do rather more often than we want to. It works way too often, too. We don't tell the NT admins this because they tend to snicker.

    --
    http://rocknerd.co.uk
  80. velociraptors by buback · · Score: 1

    I think we all know why you shouldn't reboot: It cuts power to the velociraptor cages.

  81. Rebooting makes sense by Ironpoint · · Score: 1

    Rebooting is important for finding hardware that is about to fail, bad fans, etc. It's also important in identifying one-time configuration that wasn't set up properly to persist across reboots. Its also good to ensure that a server will come back up, especially if its a server no one typically monitors. And if rebooting renders a critical service unavailable, then the service needs to be redesigned so that it doesn't depend on a single machine.

  82. no time by Venik · · Score: 1

    Obviously, the "no reboot" dude operates in a generously staffed environment, allowing him plenty of time to dick around with stale NFS mountpoints and memory leaks. Most of us don't have the luxury of time.

  83. In production environments... by ickoonite · · Score: 1

    ...we schedule our Linux boxen to reboot weekly. Without fail.

    I work in an investment bank. There, they don't have time for this uptime-dick-size-contest shit. The longer a box is up, the greater the likelihood that some hardware failure is going to fuck it up and it won't boot again. A reboot is a great way to tease weird issues out into the open so that they don't screw you over at the critical moment. Of course we have redundant servers, but that's not the point.

    Ultimately, the 3 year uptime on that web server you brag about is a disaster waiting to happen. I can give a pretty much cast iron guarantee that when it does go down, whether by choice or unexpectedly, it won't come back up again smoothly. And then you're fucked.

  84. It's a persistent myth that slashdot is for nerds. by BitZtream · · Score: 1

    The first thing I learned about unix was 'you never reboot it'.

    The second major thing I learned about big UNIX hardware is because the shit is so finicky that rebooting may be its death. I learned that day, after watching a sysadmin argue with a Sun tech for literally an hour or more that it shouldn't be rebooted ... that sometimes the hardware doesn't work on the next boot, as was the case for this multimillion dollar Sun server that now decided it no longer had anything on boot flash so it was unable to boot ... because the Sun tech insisted he reboot for a kernel patch that had no relation to the problem at all.

    I don't know any UNIX admin who thinks they should reboot their server, ever. I know unix admins that will patch everything on the system and restart daemons even if they have to apply patches one file at a time (patch cluster failing to install on its own for instance), I've heard stories about admins patching binary code in memory JUST to keep a system up and running. UNIX admins take their work more seriously than anyone I know, including just about every Doctor I met.

    I know plenty of people who run Linux who reboot their machines for fun. I know plenty of Windows admins who reboot there servers at the first sign of failure.

    UNIX admins on the other hand login and find the problem, fix the problem, and let the machine live.

    For those saying 'OMG I DON'T WANT TO RUN SOFTWARE THAT OLD!!!!' ... well then you aren't an admin either, you learn rather quickly that when it ain't broke, don't fuck with it. Kernel flaws that can't be mitigated in application daemons or firewalls are pretty freaking rare so theres almost no weight what so ever behind that, pretty much every exploit that exists is a userland exploit, not kernel so unless you have a new feature that you need in the kernel you rarely HAVE to screw with it now days with KLMs and such, hell, I haven't even rebooted recent FreeBSD builds due to kernel changes, just rebuild and reload the KLM in most cases, I'm certain FreeBSD isn't unique in this ability.

    You know what you call someone that thinks you reboot a server to solve problems? You call them a fucking Windows USER. Not an admin, and most certainly not a UNIX admin.

    If you think rebooting to fix an unknown problem is a good idea its because you don't know what you're doing. I don't mean that to be malicious, its just the reality of it. An intelligent admin doesn't do shit until he/she knows whats going on.

    And with great sadness it brings me to my final duty as a Geek with a conscious ...
    CmdrTaco your geek card is hereby revoked. You have exceeded your 'retarded idiots posted to the front page' quota by at least 3 in the last month. This one just sends it over the top. You are no longer considered a geek. All privileges associated with your geek card are also hereby revoked.

    --
    Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
  85. Propiatary Binary Code by unlocked · · Score: 1

    When a system runs all gpl'd open source software that is tested by tens of thousand of users and distro maintainers then why would you need to reboot. It only comes into question when your system becomes littered with binary kernel modules and binary programs like flash and god knows what other binary code gets put on without rigorous peer review. Thats when you run into trouble and your system becomes unstable. Not to mention getting your repositories messed up. As more and more of the Microsofties embed them selves in Linux/Unix it will only get worse.

    1. Re:Propiatary Binary Code by unlocked · · Score: 1

      Also old VAX machines has a multitude of terminals connected. How would you like your session to evaporate for no reason other than cleaning out the cob webs as some would have you believe.

  86. too general by Anonymous Coward · · Score: 0

    He kind of makes a roundabout commentary that linux is prone to issues that will cause a reboot to fail vs windows. It's an impractical stance, you don't always have the expertise to figure everything out or you expert it not there. If a linux system goes nutty on you, it can be hard to figure out why. I've had nfs and lvs issues where a reboot was needed, I found out later there were known issues that caused the faults.

    You can't spend your day debugging everything, it should be safe to reboot, the reboot may be the fastest way back up and that counts more than anything with some servers. Those long uptime servers scare me, will they come back up? I tend to force reboot after any kind of change to make sure that they will come up.

  87. what myth? by Tom · · Score: 1

    That's a myth?

    Maybe among Fox News commentators, or windows admins, or some other kind of inferior life form. I've never heard a Unix admin say that, and I've been working with them or as one of them for over 15 years.

    You don't reboot a Unix system except for:
    * kernel updates
    * hardware replacement
    * you really, really, really have tried absolutely everything else and you need a minute of time to think while you look as if you were doing something else than staring blankly at the wall and you know it fools the windows-only boss-idiot.

    I know my Unix systems measure their uptimes in months and sometimes years, and I still remember that one time when I spent an hour looking for traces of what I thought was an unexpected reboot, until I finally found out that the Linux uptime counter rolls over after some 460 or so days.

    You don't reboot Unix servers. Whoever thinks the premises of the article with its myth is remotely true - please don't let people like that within 10 feet of your servers.

    --
    Assorted stuff I do sometimes: Lemuria.org
  88. UNIX admins are lazy by OrangeTide · · Score: 1

    We learned from experience that it is easier to patch running binaries and manually restart services than it is to fix UNIX boot issues, especially on a remote server at 3am locked in a cage. The reliability of your booting on UNIX/Linux depends on the quality of your system maintenance.

    --
    “Common sense is not so common.” — Voltaire
    1. Re:UNIX admins are lazy by choke · · Score: 1

      My laziness contributes to my success. Everything I do is intended as an investment in doing less in the future.

      I will work pretty smart to lower my work hard factors (and work late, and work weekends, and work stressfully...)

      --
      "No good deed goes unpunished"
  89. Bad advice by Anonymous Coward · · Score: 0

    At some point the physical server will need to be rebooted. It is easier to recall a script that was changed 2 weeks (or 2 months ago) that failed during reboot than it is to recall all the changes performed over the last year.

    You don't need to reboot the "clean things up" very often on servers. That is very true.

    If you are using virtualization - and you should be - then you can migrate a running VM to another VM server (KVM, OpenVZ, ESX, etc) and not take it down, but the running kernel will eventually need to be rebooted, even in a VM.

    Most of my servers have uptimes that reflect when the last kernel update was made. Usually, those updates happen every 2-4 months. I'd post my uptimes now, but there was a kernel update 15 days ago so they aren't "impressive" if you are impressed by that. OTOH, ... here's an ESXi uptime:
      # uptime
      22:28:01 up 474 days, 5:16, load average: 0.00, 0.00, 0.00

    You shouldn't be impressed. It means I'm at least 1 patch level behind, perhaps 2.

  90. Re:Rebooting is often required for Security Patche by Culture20 · · Score: 1

    Rebooting isn't a crime--and is often necessary after applying security patches. Think about that the next time someone tells you it'd been 700 days since a reboot; 700 days of exploits you can choose from to assail that machine with.

    Wow, a kernel exploit in KVM which isn't installed on the system. A DoS kernel exploit in a network module that's not installed. That about sums up the usual 700 days of exploits.

  91. Eliminate Rebooting for Kernel Upgrades by Doc+Ruby · · Score: 1

    I thought the Linux community was working on upgrading kernels without rebooting. Just store the new kernel file(s) in the filesystem, and run some code to cut over from the old kernel to the new one. What happened to that?

    --

    --
    make install -not war

  92. You a modern college student? by bussdriver · · Score: 1

    ONLY JAVA. I wouldn't be surprised if I heard a graduate say "whats a pointer?" or "is that like a reference?"

  93. Re:It's a persistent myth that slashdot is for ner by Doc+Ruby · · Score: 1

    Any HW that's so "finicky" that it won't necessarily run properly after a reboot cannot be relied upon, because reboots are sometimes either unpredicted, necessary or both. That machine is not worth "multimillion dollars", unless it's the subject of some kind of major R&D project - which I expect it wasn't. Fire the sysadmin and Sun tech, and replace them with a team that will make a machine that isn't a "high bit" away from failing the entire business.

    And I say that as someone whose standards are to reboot Unix/Linux machines only on HW upgrades requiring a power cycle, or a kernel upgrade requiring restarting init.

    --

    --
    make install -not war

  94. What rumor? by markdavis · · Score: 1

    >"It's a persistent myth: reboot your Unix box when something goes wrong or to clean it out."

    I have been using and admin'ing Unix (and Linux) systems for over 22 years. I have *never* heard of such a rumor. I hear it all the time for MS-Windows boxes and MS-Windows servers. And many appliance-like boxes based on MS-Windows (like our horrible security camera system) have rebooting itself BUILT-IN. But for Unix/Linux???? I think not.

  95. Wow... by multimediavt · · Score: 1

    Lots of 1, 2, 3 ratings in this thread. Well the reasons to reboot a Unix server are many, but it's contextual. Some have mentioned the obvious answers, i.e., to test a redundancy or restore operation, to verify hardware integrity, to verify that software patches will stick after a reboot, etc.

    Here's another good reason to reboot often from the HPC world, after every job that runs on compute nodes. HPC code bases are notorious for not exiting cleanly and heaven knows what residual processes or memory clogs are left on a node after a job runs. It's almost always good to run an epilogue script to clean up and reboot nodes after a job terminates. Hell, in some cases we completely re-image a node after a job to make sure things are clean.

    So, yeah, this "never reboot a Unix box" attitude comes from people who build boxes that don't change (stupid practice in the modern vulnerability a week environment), or are just plain ignorant all together. If you are building production Unix servers that are single machines with no redundancy and aren't rebooted often (once or twice a year at least) you're not going to be working for much longer. The first major failure will find you on the unemployment line.

  96. Enterprise Class Systems by NeoMorphy · · Score: 1
    If you are paying a lot of money for Enterprise Class systems, then you should only have to reboot when a system patch requires it or if there is a system problem occurring that requires a reboot. But....... if you are rebooting because of a system problem, then I am assuming you are creating a system dump so that you can send it to tech support to identify what the problem was.

    The ability to identify system problems is an important skill, learn how to do it, don't just reboot hoping the problem will go away. Learn how to do system traces/tcpdump/etc. You should even learn how to do system traces on user processses so that you can tell them that the reason their application is failing is because it is trying to read a file that doesn't exist. Hand holding isn't in our job description, but it's better than rebooting for nothing. What do you put in your system downtime report? "Rebooted system, problem gone"? Steps taken to prevent problem from happening again, "Schedule Daily Reboots". No joke, there are "users" who think weekly reboots are a good idea. How do you manage over a thousand servers with weekly reboots, especially if there are dependencies? And why would anyone think that the admins are the only only ones who are involved in having applications coming up properly? We don't manage the databases, if the DBAs screw up a script that causes a database to not come up properly, the UNIX admins can't control that. But when some user wants a server rebooted because they don't know what they are doing and the database doesn't come back up, we still have to hang around at 03:00 in the morning while someone pages the DBAs to fix it.

    Enterprise Class systems are supposed to have high uptimes. You have redundant and hot pluggable adapters, disk drives, you can even dynamically rearrange processors and memory. You have multiple fabrics leading to your SAN storage, and your SAN systems have hot swapable adapters/disk drives/control units/power supplies and RAID storage. If you feel that it's okay to have frequent reboots, then you might as well be running inexpensive x86 servers without any redundancy. Personally, I prefer more sophisticated systems that rarely go down because of hardware, and if the OS has a bug that forces a reboot, I want the OS vendor to fix it!

    If an application team that is lacking in "problem determination" skills ask you to reboot a Linux server and their problem is still happening and the Linux server is actually running under VMWare, don't be surprised if they ask you to reboot the entire VMWare server and now you have just bounce all of the guest servers running on that VMWare server. Meanwhile the original problem was caused by a change made by the application team, and a reboot should not have been needed.

    Can you imagine Air Traffic Controllers rebooting systems on a whim? There are plenty of important systems using computers that we would like to never have to reboot. We should be making that a goal instead of thinking reboots will always be needed at random times.

  97. My experiance by wisnoskij · · Score: 1

    Never been in control of a unix server but my linux desktop needs rebooting all the time to fix issues it gets.

    --
    Troll is not a replacement for I disagree.
  98. So... by wisnoskij · · Score: 1

    So rebooting is not good because it might not fix the underlying problem, well of course but that is the same with windows.

    But it seems to me a good timesaver to at least reboot the first time you see a issue, and if it come back then you know it is a recurring issue and that it might be worth spending hours fixing it.

    And I really do not understand how you can even get a system to stay online for at long as unix veterans do, I have never used a system (windows or linux) that did not get more and more unstable over time (even routers). any device I wanted to stay up all the time I would have automatically reboot occasionaly.

    --
    Troll is not a replacement for I disagree.
    1. Re:So... by bobstreo · · Score: 1

      The only real reasons to reboot a Unix system are, in no particular order:

      Patches that require a reboot (Kernel...)

      Upgrades that require a reboot.

      Hardware issues (replacing hardware on some systems actually require you to shut down the boxes)

      The every 5 year mandatory UPS power work.

      If you've virtualized and zoned the heck out of your whole environment, what part do you really reboot?

      And lets not forget, a reboot is NOT the same as a cold restart.

    2. Re:So... by wisnoskij · · Score: 1

      Ok those are the only reasons you !have! to restart a machine.
      I am not arguing that. I am just saying rebooting is not a big deal, their is no reason to go to lots of trouble to avoid it (other then bragging rights I guess).

      I would say the same can be said of reinstalling the OS. Many people (in both the windows and linux camps) go to extreme length to not do this, from being very careful to not get viruses to not installing programs on their machine to spending lots of time trying to get rid of viruses.
      Personally I just always know I can spend a handful of minutes and get back a working computer with a OS reinstall.

      Both the restart and reinstall are quick fix-all things you can do to a computer, I would even go so far as to say a computer user really only needs to know how to do these two things for them to handle all normal problems they will encounter.
      And unless you have a good idea of what is wrong in the first place, these fixes will normally be the quickest most likely to succeed things you can try.

      --
      Troll is not a replacement for I disagree.
  99. Business processes by Dennis+Sheil · · Score: 1
    The real answer to this question is it depends. If you have a small business, it might not be necessary to do reboot your servers. Let us say however, you work for a financial company which trades from Monday to Friday, and demands high availability. Uptime on Saturday and Sunday does not matter much. Also, you are somewhat, but not always fully aware of what processes are running on the various systems. In a scenario like this, it makes sense to reboot machines. Sometimes you find equipment faults that you wouldn't have otherwise. You make sure that necessary processes are properly put into init/rc files. If you do not do this, and say your whole machine room experiences an outage one day, you might bring all the machines up and suddenly discover there are dozens of processes which were running which were never put into init/rc files properly. If you reboot once a week/month, at a scheduled time, this can be handled incrementally and without incident, having it happen all at once can be a nightmare.

    The answer is it depends on the business. In my experience, it is not IT departments (and thus companies) which reboot machines once a week that have troubles, but companies which have certain machines that never are allowed to go down that have troubles.

    1. Re:Business processes by dbIII · · Score: 1

      If you do not do this, and say your whole machine room experiences an outage one day, you might bring all the machines up and suddenly discover there are dozens of processes which were running which were never put into init/rc files properly

      One could be a mistake, but dozens means that a strawman is being constructed from a pile of bullshit.

    2. Re:Business processes by Dennis+Sheil · · Score: 1
      > One could be a mistake, but dozens means that a strawman is being constructed...

      As I said before "it depends. If you have a small business, it might not be necessary to do reboot your servers...however...for a financial company"

      There is no strawman. It depends on the business. I worked for a company where the number of servers being rebooted were in the thousands. Just for the UNIX servers, just in New York City, once a week we would find one if not several things that would not come up properly due to lack of init/rc scripts. I should modify my original statement slightly though - the problem with rc/init scripts were not solely due to their lack of existence. Sometimes the application-specific init/rc scripts were there - but they would be written wrong. Sometimes they would be written correctly, but would have a dependency on the database, but the application would start before the database would and fail. If machines were never rebooted and were all running for say three years, and then there was a sudden machine room crash, yes, no hyperbole, no strawman argument, there would be dozens of broken init/rc scripts to deal with, amidst all the other problems that might come up if a machine room crashed. This company chose to handle this by regularly rebooting the servers, so the fixes would be put in incrementally. There is another benefit to this - a developer who forgot or messed up on an init/rc script probably still has the application fresh in their mind. If you wait three years, the original developer may be gone, and how to fix it becomes more complex, especially amidst a machine room crashing.

      How is one a mistake and dozens a strawman? It is simply a matter of scale, as I alluded to in my original post. Some shops have one developer and a few machines in one machine room, some shops have data centers internationally, with thousands or even tens of thousands of servers, and hundreds, or even thousands of developers. A rare problem in a small shop can come up often in a large shop, quantity transforming quality.

  100. Tracking tool not security measure by dbIII · · Score: 1

    In small shops where nobody with root can dodge responsibility you don't need sudo.
    It's really just a way to track who does what AFTER THE FACT and not a security measure. In fact at times it's a security hole if you are not really careful.
    If you are really careful and have a nice short list of what people can run as root it can work - but often with just a little more time you could tweak the permissions of the files so the right people can use them without going anywhere near root.

    It's no security measure, just a "who the fuck did that yesterday?" sort of tracking tool useful if a lot of people need root access to a machine. Nobody with much *nix experience spends much time logged in as root unless they have to anyway.

    1. Re:Tracking tool not security measure by Compaqt · · Score: 1

      For my part, I always try to do everything possible as non-root:

      Need to look up a command with man? non-root terminal. Use root when you have the syntax right.
      Finding some config or library file with find, locate, or ls? non-root
      Just reading a config file out of etc? less as non-root. Use vim as root when you're ready to modify.
      "aptitude show" or "yum info" can be done as non-root. I only use root for "aptitude install" or "yum install"
      Same for "service status" as non-root. "service start" as root.

      --
      I'm not a lawyer, but I play one on the Internet. Blog
  101. Totally missed the point by dbIII · · Score: 1

    Instead have you considered that it's about people out of their depth in a different environment to what they are used to? Of course they make mistakes and it pisses off people like the above poster that would prefer people to be competant at the job they are actually doing instead of a different one.

  102. Which system is there without kernel updates? by jschrod · · Score: 1
    I've got a question for all those folks who brag about $n$-years uptime and no need to reboot:

    What system are you running that doesn't need a kernel security update?

    I'd really like to know, I'd love to run such a system myself. All the systems that I run, need kernel updates a few times a year; and thus needs to be rebooted.

    Well, since I'm running my servers in HA environments anyhow, reboot confirms that no non-kernel updates messed with boot configuration w/o any loss for availability. So, it's not so bad after all.

    --

    Joachim

    People don't write Manifestos any more -- what's going on in this world? [Frank Zappa]

    1. Re:Which system is there without kernel updates? by Anonymous Coward · · Score: 0

      I'd really like to know, I'd love to run such a system myself. All the systems that I run, need kernel updates a few times a year; and thus needs to be rebooted.

      http://www.ksplice.com/

  103. tl;dr-If You Reboot You Can't Diagnose The Problem by littlewink · · Score: 1

    The title says it all: rebooting loses info that may tell you what went wrong. First diagnose the problem and find a solution. Then reboot.

  104. The triumph of mass ignorance vs experience by dbIII · · Score: 1

    Who cares about uptime?

    The guys that are trying to run stuff while the system is down.
    This silly argument is really just the inexperienced arguing with the experienced. The experienced will say "what happens if I can't get the thing back up after a reboot today" because they will have seen enough hardware problems to have experienced that. The inexperienced will not care or even have a clue they should.
    For that reason I tend to wait until the end of the day for some systems. There's no reason for people to be sitting about waiting all afternoon for a server to come up just because I've rebooted it at lunch time and the file systems need four hours to sort themselves out.

    present technical reasons why not

    Because it might not start up again? Is that a good enough reason for you? My workplace lost electricity for a couple of days a month ago and it took four solid days to properly replace a machine that didn't come up again. If you have enough machines something typically dies each year and it often manifests as a machine not turning back on after it has lost power or after a reboot.
    That's why the experienced often only reboot when there's a time window to do so or something else to fall back on or if the machine isn't all that important anyway. The inexperienced do shit like having only a single MS Windows domain controller and rebooting that during peak working hours when the disk was listing a lot of hardware errors.

    So yes I reboot stuff but if a dozen people need it within five minutes I'm not going to do so unless it really needs it.
    Just about all the stuff I run gets shut down every six months anyway. That's when I'm ready to find problems and have time to deal with them properly.

  105. The cult of gratuitous uptime by OverflowingBitBucket · · Score: 1

    The cult of gratuitous uptime has always bewildered me.

    Eventually, you are going to run into a problem that you do not know how to solve on a live system. Even if you are arrogant enough to believe in your own omnipotence, eventually you are going to run into a problem that *cannot* be solved on a live system. And even if that never happens, a hardware failure is going to bring your system to a screaming halt one day.

    Sometimes the problem you are trying to solve will take some time to understand, and you need a solid understanding to plan how to properly fix an ongoing problem. Sometimes you have to hack a quick fix to see you through whilst properly planning your long-term fix.

    Sometimes, some clown screws up and takes something out that the live system needed. Sometimes, that clown is you.

    If you have been concentrating on keeping your amazing uptime, you have probably been neglecting any verification that your server is even *capable* of correcting booting back up, if you are ever forced into a corner and need to bring the system down. Discovering this, and attempting to fix it mid-crisis, is negligence at its worst. And leaving it to chance based on the belief you are a *nix-God and couldn't possibly be wrong?

    Being reasonably confident that you can shut down and bring up a system in a crisis is important. If you're not taking time to do this verification because you think that your excessive uptime demonstrates your hardcore *nix-guru-ness, you're just being negligent.

    1. Re:The cult of gratuitous uptime by tboulay · · Score: 1

      I kind of agree here. Most of what I've dealt with in the past has been fully redundant; mostly with load balancing and H/A. Any company that implements systems like this have at least quarterly windows where we would fail one side of the cluster in a purposefully "bad" way (ie. pulling the power cables). So most of the best built systems I've ever worked on have no more than 90 or so days uptime.

      I have however had systems with more. 1284 days on a sun ultra 1 is the most. I've been a sys admin for a long time and I'm very much of the persuasion that I almost never have to reboot a unix box, and I'm almost always able to solve issues on the live system.

      The highest uptime boxes however, had absolutely nothing to do with my pride in their uptime and pretty much everything to do with the fear of what what would happen if this ancient little obscure system with software that hasn't been sold in 9 years went up in smoke on reboot.

  106. rebooting is a good idea say once a year by Anonymous Coward · · Score: 0

    Yahoo!: the Filing Test
    Reboot each server once a year to prove that won't cause a problem.

  107. Admin for programmers by Compaqt · · Score: 1

    Would you care to post a link to a good guide to sysadmin for the professional developers yet amateur administrators here?

    --
    I'm not a lawyer, but I play one on the Internet. Blog
  108. Via network on my machine requires a powerdown by NtwoO · · Score: 1

    My machine is unfortunately using Via chips. I've noticed that sometimes the network card will lock up. The only way of regaining network is actually powering down. A warm restart will not kick the chip back to life. Only a complete powerdown will do the trick.

    --
    ! /* */
  109. ICMP by Compaqt · · Score: 1

    That single reason is why some geeks are fond of Google as opposed to Microsoft.

    ping microsoft.com

    100% packet loss.

    Google (and Yahoo), on the other hand, have their servers configured correctly. They "get" it.

    --
    I'm not a lawyer, but I play one on the Internet. Blog
  110. Nonsense by Anonymous Coward · · Score: 0

    I never heard of such a myth... Maybe the opposite!

  111. Lack of runtime self tests by renoX · · Score: 1

    While I agree that reboot are not a full solution, I also think that you cannot currently do live FS & memory checking, things which are really nice to do periodically..

  112. Reboots help specific problems by JBv · · Score: 1

    We reboot servers on our (very small) installation.

    The most obvious is if I change the configuration of the server in a non trivial manner, and need to make sure the whole server is consistent.A reboot is the quickest way to find that out if the changes stick and are consistent with the rest of the server software. It is best to test then than to wait for a reboot due to an unrelated problem (hardware failure, power cut, trip on the cable, etc...) and then have to figure out why the server is not working as expected.

    Some of our servers running commercial software also have to be rebooted periodically, due to bugs of the software that clog the server to a useless state if left alone. We could shutdown all offending processes, clean the server state, and restart them, but a supervised reboot will reset the state of the server in a reproducible manner.

    Other than that, we only reboot if and only if there is a security patch for the kernel that we have to apply or a critical firmware update.

    These are probably not best practices, but work in our environment.

  113. Hyperbole reigns supreme! by wkcole · · Score: 1

    Paul Venezia explains why you should almost never reboot a Unix server,

    Of course, that is not quite what the actual article really says. It's really just about breaking the habit that Windows "admins" have of using reboots as a means to clear up mysterious trouble. IMHO that habit marks one as a "operator" at heart rather than a sysadmin, but I digress...

    In addition, even what Venezia does say goes a bit too far, stating that anything short of a security problem in the kernel proper shouldn't be an excuse for a reboot. In the real world, many cases of rebooting after non-kernel updates may be technically unnecessary but as a practical matter the reboot is the safest approach. It may be theoretically possible to track down each changed shared library, kernel module, or config file and determine all of the running processes thast are dependent on them all and manually shot down and restart each one in proper order, but attempting that is far more likely to cause extended dysfunction than a straight reboot after making major updates. There are also performance tuning issues that can force a reboot just to change a kernel parameter. Finally, on some platforms (notably Solaris) it is possible for pathological software (notable Oracle) to micro-manage memory in ways that over time cause the kernel's memory map to be an unmanageable jumble of tiny free, allocated, and locked blocks around a few big locked islands such that the performance of an exec or a context switch is degraded significantly. The only ways out are a de facto reboot (take everything using memory down and then restart it all manually) or a real reboot. With a real reboot you can be sure to avoid human error and you get a POST cycle as well.

  114. Last Post? by hicksw · · Score: 1

    There are two main reasons to reboot a server:

    (1) because you don't know what's wrong, or
    (2) because you do know what's wrong.

    A good admin may not be told enough about a Windows system.

    A poor admin may not be able to understand enough about a unix system.

  115. Failure in thought processes by dbIII · · Score: 1

    How is one a mistake and dozens a strawman?

    OK then - it's actually even a hell of a lot worse than that. If you've really got something that isn't a complete lie where you have DOZENS of faulty changes to init scripts you are blaming one fairly innocent bit of process with a catastrophic failure in DOZENS of others.
    Idiots that do not check their work and (your words) make DOZENS of mistakes really have nothing at all to do whether it's a good idea to reboot or not.
    If you really had a fucking clue what you were talking about instead of trying to build an unlikely argument out of bullshit you would know that init scripts can be tested without a reboot and you would consider those that don't do it on important machines idiots.
    What sort of idiot changes an init script on anything of any importance without seeing if it works or not? If it does require finding a window to reboot then do it and none of this fucking made up bullshit of DOZENS of unchecked init scripts that contain mistakes. So what's that to get DOZENS then - over one hundred changed init scripts on one machine and a quarter of them with errors in them?
    So yes - that scale indicates either bullshit or complaining about what is really a massive QA failure.

    1. Re:Failure in thought processes by Dennis+Sheil · · Score: 1
      Yes, dozens. The original discussion is whether servers should be rebooted regularly (once a week/month or whatever) or should stay up as long as possible. If every other week a bad rc/init script is put in (and in my experience the average was higher), within 48 weeks you will have dozens of bad rc/init scripts. Thus "you have DOZENS of faulty changes to init scripts" as you put it. Correct. As I said before, some shops have one developer and a few machines in one room, some have tens of thousands of servers all over the word, and developers numbering in the high hundreds to low thousands - I have worked in both type of environments. Not only can this happen in a large environment, it does happen.

      As far as "those that don't do it on important machines" that you talk of - naturally, this would tend to happen more on non-production, non-critical machines. I am aware that rc/init scripts, if they were ever installed in the first place, can be tested without rebooting. Developers do not always do this though. Dependencies (e.g. making sure your application starts after the database if it depends on the database and needs it running before the particular application is started) are a little more tricky to check but can be dealt with if some thought is put into it by the developer. Thought is not always put into it though.

      You ask: "So what's that to get DOZENS then - over one hundred changed init scripts on one machine and a quarter of them with errors in them?" One machine? I am talking about thousands of machines, most of them with a number of (sometimes many) non-system rc/init scripts, touched by hundreds of developers, who are always changing their programs, many of the machines non-production, over the period of time discussed - as long as a machine can manage to stay up - which can be years. As I said in my original post, I am talking about large (financial or otherwise) companies, not small shops.

      If you're a sysadmin, what's the largest number of machines you've ever had to support in a developer-heavy environment? If your answer was in the thousands, you would probably be less skeptical. This article says Facebook has 30k servers, Rackspace has 50k servers, Akamai has 60k servers, and Google has over 1000k servers. I can assure you that these companies regularly deal with server problems that the standard sysadmin with dozens of servers to look over is not used to - even a company with one quarter of Facebook's servers is going to have a different sort of quality to their systems setup than a company with 150 or so servers.

  116. Misleading - so what is your real point? by dbIII · · Score: 1

    Since you've missed the point, lets look at the steaming disaster of vast incompetence (suddenly discover there are dozens of processes which were running which were never put into init/rc files properly) you are using as an example and then examine what you are pretending it proves.
    Your example is people messing with scripts that determine what a system does on startup. It is universally agreed that it is a good idea to test such things properly and often with a test restart in all but extreme circumstances.
    For some reason you appear to be saying that this example applies in situations where typically nobody thinks it is a good idea to test such things with a test restart.
    You've also gone on and described a situation where the systems are so critical that they cannot be restarted yet not important enough for any attempt at getting the startup scripts right - what can you possibly expect other than cries of BULLSHIT!
    The waffling about size is just a handwaving distraction by an attempt to look impressive. If you've got large numbers of machines you typically have both good practices in place (instead of the bullshit above about dozens of unexpected problems per outage) AND the leeway to properly get a change window on each affected machine to make sure the things are going to actually come up. You've probably also got a pile of machines identical apart from the hostname and network address and could afford to lose one for a few minutes to see if the init scripts work.

    You've got an example where the things actually do need to be rebooted and just about everyone agrees as a bait and switch for some stupid argument to "reboot machines once a week".

    With your IT shop from hell example rebooting everything once a week just in case is not going to do anything to save it - it's nothing but a faulty strawman argument that really proves a point about QA instead of what you are pretending it proves.

  117. Re: point by Dennis+Sheil · · Score: 1
    On some level this becomes a discussion along the lines of, which is better, vi or emacs? Obviously people have different ideas on how often one should reboot. For me, it also depends on the shop - different policies are good for different types of shop.

    Regarding "a situation where the systems are so critical that they cannot be restarted yet not important enough for any attempt at getting the startup scripts right" - as I said in my original post, it can be that Monday to Friday are to some extent critical, and Saturday is not.

    With regards to test reboots being done - sometimes they can be, but sometimes a new application can be put on a machine which has been around for a year, with many other developers using the machine, many dependencies and so forth. Thus in some situations, people would have to wait until the weekend for the machine to reboot.

    Anyhow, this is how it is in some places. People have different opinions about the wisdom of such policies, such as yourself. I've said about as much as there is to say on the topic, anything more and I would be repeating myself.

  118. No troubleshooting? by Anonymous Coward · · Score: 0

    Too often the problem is that the reboot is seen as a way to get the system "working" like it was before. But in reality the system isn't working right at all. Spending a day with a broken box to diagnose a reoccurring problem is usually the way to go. But too often the reboot hides any true root cause.

    I've worked with a few clients that got mad when I fixed a misconfiguration. They didn't say anything but I could tell they had egg on their face since they had been blaming the vendor for the last five years. I was tired of rebooting.

  119. Old but true: windows re-boot, unix be root ... by Anonymous Coward · · Score: 0

    Old but true: windows re-boot, unix be root ...

  120. How timely by choke · · Score: 1

    I just fired a client who insisted on monthly reboots of their linux boxes. (and mangled their udev, and insisted on using a vcd to boot single user because grub was 'dangerous' ...)

    I won't stop anyone from being stupid, but I am damned sure not going to do it for them.

    --
    "No good deed goes unpunished"
  121. Re:It's a persistent myth that slashdot is for ner by choke · · Score: 1

    Sometimes it's not hardware.

    I just terminated a contract where the client insisted on mounting their SAN volumes in a way that made booting fail about half the time.

    They had no idea what they were doing, but insisted on doing it in a bad/broken way and were more interested in posturing than solving problems.

    --
    "No good deed goes unpunished"