Server Uptimes Ranked

← Back to Stories (view on slashdot.org)

Posted by ryuzaki0 on Wednesday December 29, 1999 @09:47PM from the you-can-prove-anything-with-statistics dept.

Bex writes "Ever wonder how your server uptime is when compared to others with different operating systems? Ever wanted some hard numbers for the Linux vs NT or FreeBSD debates? Check out uptime.hexon.cx for a list of servers and some interesting number on uptimes. It looks like FreeBSD stomps all over everybody else, with a whopping 1994 days of uptime for one server, and a 138 day average uptime. NetBSD is second for max uptime, but better on average. Windows 2000 is second to last, just barely beating out BeOS with a paltry record of 49 days. Its average uptime was under a week! Remember, downtime doesn't always mean a crash, but is is a good indication of how often a machine needs maintenance." Update: 12/30 10:33 by R : There's a new version of the uptime page here.

17 of 396 comments (clear)

Min score:

Reason:

Sort:

Re:Notice something familiar about MS uptimes? by rebill · 1999-12-29 21:38 · Score: 5

Given that IBM's OS/2 and Microsoft's various Windows operating systems shared a similar code base, doesn't it seem weird that OS/2 never seemed to suffer this problem?
In fact, it did.
In 1997, a team at Ford Motor Company had noticed that, after 6 to 8 weeks of operation, our OS/2 v2.11 machines would begin executing once-per-day tasks several times during a day, and after several such executions, the computers would crash. Further analysis showed that those tasks were being executed every 1h09m54s. I spent a day trying to manipulate that number into something meaningful, and gave up in frustration. We assumed that our own code was to blame, and rewrote it several times (to no avail).
After rewriting our code seemed to have no effect, we decided to install the latest set of O/S patches on our machines. On a Sunday, we moved between machines that were scattered over a several hundred acre manufacturing facility.
Black Monday
Seven weeks and one day later, the facility started building units. Within hours, the OS/2 systems started showing the symptoms that led to the crash, in the exact order in which we upgraded them!
The coincidence was too much to escape notice, so we called IBM Technical support. Their Level 1 guy spent about five minutes talking to us before he realized this was a deep O/S problem, and we were kicked to Level 2 support. The Level 2 person heard our version of the events ("The machine flakes out and every 1h09m54s executes a task that should only happen once a day"), and asked, "are the machines crashing 49.7 days after being rebooted?"
BINGO!
Apparently, someone inside IBM had noticed this problem a few weeks before we did, and had the patch in final testing when we called. (I think the patch was #XR2011 or #XR2014). However, since we were a customer, our bug report took priority over theirs.
The Problem
Someone used an unsigned 32 bit integer to record the number of milliseconds since the O/S was booted. That number rolls over after 49 days, 17 hours, 2 minutes, 47.296 seconds. The symptoms we saw began the day previous to crash day due to the rollover that occured when our code scheduled a task for "tomorrow".
The Moral
It's too bad that Microsoft and IBM were not on speaking terms at this point. If they had still been working together, MS would have had a fix two years ago.
Russ B.

--
Chivalry is not dead, it's just frequently misspelt. - M. Langley
Re:What does this measure, really? by tfb · 1999-12-29 20:38 · Score: 5

It measures nothing. In particular I really doubt that most Solaris people come from NT backgrounds. Rather, most people running serious machines will simply not bother to install some random daemon to let other people know information that they don't particularly want to give away. I can see Solaris machines with 285 day uptimes from here, and they aren't particularly special.

I also must take issue with the `uptime being a point of pride' thing. If the machine doesn't have any particular state (say it's an NFS server), and it's going to take some time to work out what's wrong with it, the professional thing to do is just reboot it: only some idiot who isn't accounting for their time properly is going to spend half a day trying to work out what's wrong with it without rebooting first to make sure it's not something transient

This survey is worthless.
Windows 2000 Not Out Yet by seaportcasino · 1999-12-29 17:01 · Score: 5

Windows 2000 is second to last, just barely beating out BeOS with a paltry record of 49 days.

Let's at least be fair here. Windows 2000 has not even been released yet. 49 days ago they were at beta 2 or something like that. Let's at least wait until it's released before we start bitching about how much it sucks. I mean, we don't want to sink to their level and start spreading FUD now do we??
Some thoughts by Dacta · 1999-12-29 17:03 · Score: 4

I don't think it is really fair to bag Windows 2000 for having an average uptime of 5 days. Don't forget this is a cutting edge MS operating system, and you are going to need reboots for upgrades. It would be fairer to judge that six months or so after the release.
I do wonder about the 49.2 max uptime for Win2000 & 95, though. There was a bug in Win95 that would crash it after roughly that amoutn of time (Can't remember teh exact number of days) - the Win2000 uptime looks suspiciously close to that, too.
I was a little surprised about the BeOS stats, too, until I realised there was only 4 BeOS machines in the survey. No Macs, either.
There is also no way to compare what the machines were doing. A hardcore development or games machine is much more likely to crash (or reboot) often than a machine doing nothing.
Conclusion? Interesting, but don't read too much into the results. It is nice to see some of the really high uptimes, though.
ugh by pmsyyz · 1999-12-29 17:04 · Score: 5

Please update the story on the main page.

The Uptimes project has moved to http://www.uptimes.net/

Also the protocol has changed, so everyone going to http://uptime.hexon.cx/ will be downloading and running old clients.

Come on, Slashdot people, research a story for 5 seconds before you post it.

--
Phillip
Well, Solaris doesn't really understand kill -9 by hatless · 1999-12-29 21:14 · Score: 3

Having admined a few Solaris boxen for 5 years, one thing I found irritating was the way an errant TCP/IP application--say, Netscape Enterprise Server--could get stuck in the middle of handling a request and end up unkillable. In order to release the port, the only remedy--I swear, ask Sun--is to reboot. Nothing you can do with kill, with proc tools, or by restarting netorking services, will kill a process in such a state, at least through Solaris 2.6.
About as good as slashdot polls by Zaffle · 1999-12-29 17:11 · Score: 3

These statistics are about as good as slashdot polls. (ie if you really believe them, you need help). The big point being, its only those who actually submit there times that are counted, so, like magazine polls, that makes it heavily bias. Case-in-point, a certain magizine in the US (I'm not sure which one) awhile back that predicted John F Kennedy wouldn't win the election. But that was based on their magazine poll, and their magazine was target towards the (for lack of a better word) upper-class.
These are hardly what I would call "Hard numbers". They are even worse than a microsoft sponsered comparision.
Don't get me wrong, I love linux, and I highly believe that yes, windows IS miserable for uptime, and yes FreeBSD does kick ass with uptime. (Linux has more frequent kernel release, and we haven't yet figured a way to upgrade your kernel w/o rebooting).
I'm surprised that an article phrased in such a way that its sounds as if its suppose to be serious would be posted on slashdot.
Thats enough ranting, I'll probably get moderated down as a troll. But seriously, tune out anyone who quotes these numbers as reliable.

--

I use to have a funny sig, but slash cut it off, and I forgot what the punchline was.
What does this measure, really? by trance9 · 1999-12-29 17:14 · Score: 4

You might think, at first, that this measures the reliability of the OS. However there are some other factors here besides the OS, the main one being the competence of the administrator.

The clue here is that Solaris has a much worse uptime than the other Unixes. Yet we all know that Solaris is a damn fine product, and I've seen some Solaris boxen with amazing uptimes.

So why does it perform so poorly here?

I think the answer is that the average Solaris admin comes from an NT background and believes that reboots solve a problem. You get some of these people in the Linux stats too.

Now look at BSD. Who runs BSD? Old guard Unix people, who generally have their sh*t together, and know the hell what they're doing. These are the kind of people for whom uptime is a point of pride, who take it as a grave personal failing if they have to reboot to solve a software problem.

So while I don't doubt that BSD is a robust and stable OS, I think that to some extent the uptime stats reflect the average level of experience of the admins, and not just the robustness of their OS.

I would guess Solaris makes a much better showing if you can eliminate this effect. BSD would still presumably edge out Linux (since uptime is what BSD developers and users strive for, I think the OS provides it), but not, I think, by a 2:1 margin.
Uptime median skew by Blu3Viper · 1999-12-29 17:16 · Score: 3

If you have two FreeBSD boxes, sitting in the closet since 1995, they have significant uptime even averaged between themselves. If you have 5,002 Linux boxes, two sitting in the closet since 1995 and 5,000 rebooted on odd chance, you have a heavily skewed bias.

Without basis on why the machines are up/down and factoring that into the averages, it's merely pretty pictures.

I have Linux boxes filtered and firewalled that have been up for years. Due to denial of service attacks to the vulnerable kernels they are running, I can't and won't post them. I will however say that two of these boxes were listed as #1 and #3 on the previous uptime site a couple of years ago. #1 had an excess of 500 days when the site disappeared.

Depending my boxes' job, it may be rebooted several times a day or it may be up for months at a time. I do a lot of code development and testing in/out of the kernel so I have a lot of boxes that get rebooted. I also have a lot of boxes that gather dusty electrons month by month. A few of the boxes I build kernels for crash. Dev kernels do that once in a while. By far however, the systems are completely stable.

All of my machines that lost large uptimes lost it 100% due to power loss.

Let's try and view these figures with an understanding that the Linux boxes outnumber all the others combined by a large number. I'm willing to bet that most of these Linux boxes are personal machines rather than black box setups.

-d
1. Re:Uptime median skew by twit · 1999-12-29 21:29 · Score: 3
  
  Very good point.
  
  I would like to see more advanced statistical data with respect to this. I would suspect that the average uptime of a linux box is a bimodal distribution, with hobbyists representing the first, larger node with shorter uptime, and professional administrators representing most of the second, smaller node with longer uptime. The first node would dominate the other, and the arithmetical average and median would be pulled towards it through no fault in the operating system.
  
  It is interesting, however. I can see the smaller node being constrained by the release of security fixes (and I wonder about the very long life FreeBSD boxes as well). I do think, however, that the best purpose of an uptimes study would be to find artificial constraints on uptimes, as respresented by abnormal distributions.
  
  This could have pointed out the 49-day limit in MS operating systems well in advance of it being reported by MS proper, for example. If a kernel bug in Linux (or any other operating system) caused frequent crashes over time, it would reveal itself in the distribution of uptimes. Like I said above, the point should be to improve our (and by "our" I mean everyone's, from BSD to Linux to Windows) OS reliability, not merely to dick-size about ludicrously long uptimes.
  
  Perhaps this calls for a more advanced massaging of data from the uptimes people :). I wouldn't mind helping. It's a project well worth the effort.
  
  --
  
  --
  
  --
  There is no premature anti-fascism. -Ernest Hemingway
How to lie with statistics.... by Chilles · 1999-12-29 17:30 · Score: 3

I feel the need to bitch about some of the numbers is see, I'd like to know what job those systems are doing, BSD computers more often than not are a network server or something similar, you don't just turn that off. On the other hand, I turn my win NT machine at work off every evening (uptime roughly 8.5 hours?) or does uptime count the number of hours the system is running between crashes? In that case my BeOS machine at home must now be somewhere around 500 days or so. I upgraded it a few times but it never ever crashed on me.
Putting win2k in this statistic is of course ridiculous, the OS that has been out for 50 days has a maximum uptime of 49 days well..*duh*
To make a statistcally valid comparison of uptimes you'd have to use the same number of systems for each OS, not well over 500 for one and just under 20 for several others. In a larger population you are naturally going to see more extremes. I bet the record for shortest uptime can also be found in either the linux or freeBSD groups. The averages of course tell us something, but in the really small populations they too are irrelevant.
I'd like to see this uptime project become bigger amongst users of less uptime centered operating systems so that the statistics become a bit more valid.

The only thing this chart tells me is that *BSD and Linux users are more concerned with statistics like this than users of other operating systems.
What would psychologists make of that?
Why uptime is irrelevant... by Brento · 1999-12-29 17:56 · Score: 5

Uptime is irrelevant for home users because of a few simple facts:

Home users turn off their computers at night. Most of the Windows users I know aren't running mail servers or FTP servers that require constant uptime, so they power down at night to save some pennies on the juice bill.

Home users don't have uninterruptible power supplies. If the power goes out, the last thing they want to be doing is sitting in front of their computer. The $100 investment just doesn't make sense for them, and thus, they experience downtime with every power drop.

Home computers are used by children. Your spiffy FreeBSD machine is probably locked in a wiring closet somewhere, well away from six year olds with a penchant for DirectX games and dripping their Cokes on the keyboard.

Home computers are moved around. It might sound odd, but you're much more likely to shut down and pull the plug on a home system than a server just to move it over a few feet or to clean underneath it.

I'm not meaning to slam Windows as a home operating system, but isn't it fair to say that Windows (all flavors, even NT) has more home users than FreeBSD? Isn't it thereby, safe to assume, that if you really have an accurate survey of uptime, Windows will naturally be lower? Just something to keep in mind.

--
What's your damage, Heather?
"Long uptime is evil" or "gee isnt my system open" by Kaptain+Krash · 1999-12-29 18:02 · Score: 3

Think about this. Some person is amazingly proud of the fact they are running kernel 2.0.18. How many vulnerabilites does this server have? What is kernel 2.0 upto? 2.0.37?? how many DOS attacks is this thing vulnerable to??
I have no idea about FreeBSD but i am guessing that either it has less time between kernel updates, the version our leader is running is the last in the stable series and FreeBSD has moved on to a new series or it too is vulnerable to any bugs or attacks that have been fixed in newer kernels.
I am proud of the fact that my servers have an uptime of only 30 days or so. Because i know that I am performing regular maintenence on them. They crash rarely, usually due to hardware failure, but I reboot them frequently to make sure they are running all the latest fixes eg a new kernel install.
This is like saying "WooHoo my bog standard RedHat 5.0 box has been up for 2 years!!" Crackers ahoy! Vulnerable target sited!! A quick search of any crack DB will give you root access in less time then it will take you to make a cup of coffee.
I would expect any NT box should have a maximum uptime dating back to the release of Service pack 5. (Dont know about 6. A few admins i know are avoiding that like the plague)
The same applies to Linux and or FreeBSD or whatever. If you fail to apply critical patches to your system, most likely in production use, why in god's name should you get kudo's off the hacker/admin community for a job well done??
Kaptain Krash.
My other .sig is a 4000 line perl program.
One useful application ... by nakaduct · 1999-12-29 18:22 · Score: 3

There's lots of [justified] griping above, pointing out how you can't draw conclusions about any OS's stability based on the longest runtimes.
While that's true, this kind of survey does give us maximum runtimes, and I don't think that's available anywhere but here.
For example, maybe a few posters could close their blathering pieholes long enough to see that the 49 -day figure applies to Windows _anything_, not just Win2k. For a startling revelation, go to The List and click "All" under "Alltime".
There are dozens of 49-day, 17h02m uptimes for Win32, and none longer. Obviously, either the OS or some popular [driver|service|screensaver] is broken [insert dumb "already knew that" joke here]. I dimly recall Microsoft claiming this was fixed in an NT service pack; obviously that's not the case.
For a more subtle trend, you can see a clump of Linux boxes topping out at 497 days, 02h27m. This is 2**32 Intel jiffies (100ths of a second; Alpha jiffies are 1024ths of a second) -- if you're running a module that assumes the jiffy count is always increasing, you'll get weird happenings when the counter rolls over. Again, I dimly recall one of the kernel people suggesting the jiffy counter be initialized (at boot) to MAX_JIFFIES - 3600, so that every module author writes code that will handle a rollover.
Faults tht only appear after a long runtime are typically easy to fix, but almost impossible to detect. Right now, the survey doesn't filter out shutdowns for known reasons, or collect enough info from the client (what modules are running, etc.).
If that changed, it could be a real goldmine, both to software maintainers, and to those who want to know when their system is due for its next crash.
cheers,
mike
Notice something familiar about MS uptimes? by Wakko+Warner · 1999-12-29 18:56 · Score: 3

The highest for any of them is 49 days. Remember that 49-day uptime bug in Windows 95 and NT? It appears to exist in Windows 2000 as well.

Does this qualify as "show-stopper"?

- A.P.
--

"One World, one Web, one Program" - Microsoft promotional ad

--
"Remember when the U.S. had a drug problem, and then we declared a War On Drugs, and now you can't buy drugs anymore?"
Very bad sample by Penrif · 1999-12-29 19:03 · Score: 3

"Nr. of hosts registered per OS"

This tells me that they aren't using a random sample, but rather you have to activly register your box to have your uptimes scored, which doesn't indicate that the sample of people is very diverse first of all. Also, the sample size for some of the OS's is horrible. A few examples:
Windows NT 71
Windows 95 30
BeOS 5
And people are actually making remarks about BeOS's performance when only 5 people have contributed their uptimes to this study? As far as I'm concerned, the only samples that are worth jack are Linux and FreeBSD (590 and 137, respectivly), and even then I don't trust the results because of the first point I brought up.

Man, sometimes you just gotta look at the numbers.
Too Bad. by Delta-9 · 1999-12-29 21:27 · Score: 3

This discussion will be null and void in about 32 hours because the world is going to lose power and blow up.

Or so the media wants us to think.

So the real statistic should be:

How many FreeBSD machines are connected to an EPS (eternal power supply) and can update uptimes.net via floppy disk, since the internet won't work either.

ok ... maybe not.