Server Uptimes Ranked
Bex writes "Ever wonder how your server uptime is when compared to others with
different operating systems? Ever wanted some hard numbers for the
Linux vs NT or FreeBSD debates? Check out uptime.hexon.cx
for a list of servers and some interesting number on uptimes. It looks like FreeBSD stomps all over everybody else, with a whopping 1994 days of uptime for one server, and a 138 day average uptime. NetBSD is second for max uptime, but better on average. Windows 2000 is second to last, just barely beating out BeOS with a paltry record of 49 days. Its average uptime was under a week!
Remember, downtime doesn't always mean a crash, but is is a good indication of how often a machine needs maintenance." Update: 12/30 10:33 by R : There's a new version of the uptime page here.
Given that IBM's OS/2 and Microsoft's various Windows operating systems shared a similar code base, doesn't it seem weird that OS/2 never seemed to suffer this problem?
In fact, it did.
In 1997, a team at Ford Motor Company had noticed that, after 6 to 8 weeks of operation, our OS/2 v2.11 machines would begin executing once-per-day tasks several times during a day, and after several such executions, the computers would crash. Further analysis showed that those tasks were being executed every 1h09m54s. I spent a day trying to manipulate that number into something meaningful, and gave up in frustration. We assumed that our own code was to blame, and rewrote it several times (to no avail).
After rewriting our code seemed to have no effect, we decided to install the latest set of O/S patches on our machines. On a Sunday, we moved between machines that were scattered over a several hundred acre manufacturing facility.
Black Monday
Seven weeks and one day later, the facility started building units. Within hours, the OS/2 systems started showing the symptoms that led to the crash, in the exact order in which we upgraded them!
The coincidence was too much to escape notice, so we called IBM Technical support. Their Level 1 guy spent about five minutes talking to us before he realized this was a deep O/S problem, and we were kicked to Level 2 support. The Level 2 person heard our version of the events ("The machine flakes out and every 1h09m54s executes a task that should only happen once a day"), and asked, "are the machines crashing 49.7 days after being rebooted?"
BINGO!
Apparently, someone inside IBM had noticed this problem a few weeks before we did, and had the patch in final testing when we called. (I think the patch was #XR2011 or #XR2014). However, since we were a customer, our bug report took priority over theirs.
The Problem
Someone used an unsigned 32 bit integer to record the number of milliseconds since the O/S was booted. That number rolls over after 49 days, 17 hours, 2 minutes, 47.296 seconds. The symptoms we saw began the day previous to crash day due to the rollover that occured when our code scheduled a task for "tomorrow".
The Moral
It's too bad that Microsoft and IBM were not on speaking terms at this point. If they had still been working together, MS would have had a fix two years ago.
Russ B.
Chivalry is not dead, it's just frequently misspelt. - M. Langley
It measures nothing. In particular I really doubt that most Solaris people come from NT backgrounds. Rather, most people running serious machines will simply not bother to install some random daemon to let other people know information that they don't particularly want to give away. I can see Solaris machines with 285 day uptimes from here, and they aren't particularly special.
I also must take issue with the `uptime being a point of pride' thing. If the machine doesn't have any particular state (say it's an NFS server), and it's going to take some time to work out what's wrong with it, the professional thing to do is just reboot it: only some idiot who isn't accounting for their time properly is going to spend half a day trying to work out what's wrong with it without rebooting first to make sure it's not something transient
This survey is worthless.
Windows 2000 is second to last, just barely beating out BeOS with a paltry record of 49 days.
Let's at least be fair here. Windows 2000 has not even been released yet. 49 days ago they were at beta 2 or something like that. Let's at least wait until it's released before we start bitching about how much it sucks. I mean, we don't want to sink to their level and start spreading FUD now do we??
Please update the story on the main page.
The Uptimes project has moved to http://www.uptimes.net/
Also the protocol has changed, so everyone going to http://uptime.hexon.cx/ will be downloading and running old clients.
Come on, Slashdot people, research a story for 5 seconds before you post it.
Phillip
Uptime is irrelevant for home users because of a few simple facts:
Home users turn off their computers at night. Most of the Windows users I know aren't running mail servers or FTP servers that require constant uptime, so they power down at night to save some pennies on the juice bill.
Home users don't have uninterruptible power supplies. If the power goes out, the last thing they want to be doing is sitting in front of their computer. The $100 investment just doesn't make sense for them, and thus, they experience downtime with every power drop.
Home computers are used by children. Your spiffy FreeBSD machine is probably locked in a wiring closet somewhere, well away from six year olds with a penchant for DirectX games and dripping their Cokes on the keyboard.
Home computers are moved around. It might sound odd, but you're much more likely to shut down and pull the plug on a home system than a server just to move it over a few feet or to clean underneath it.
I'm not meaning to slam Windows as a home operating system, but isn't it fair to say that Windows (all flavors, even NT) has more home users than FreeBSD? Isn't it thereby, safe to assume, that if you really have an accurate survey of uptime, Windows will naturally be lower? Just something to keep in mind.
What's your damage, Heather?