Server Runs Continuously For 24 Years (computerworld.com)

← Back to Stories (view on slashdot.org)

Server Runs Continuously For 24 Years (computerworld.com)

Posted by EditorDavid on Saturday January 28, 2017 @12:34PM from the zero-downtime dept.

In 1993 a Stratus server was booted up by an IT application architect -- and it's still running. An anonymous reader writes: "It never shut down on its own because of a fault it couldn't handle," says Phil Hogan, who's maintained the server for 24 years. That's what happens when you include redundant components. "Over the years, disk drives, power supplies and some other components have been replaced but Hogan estimates that close to 80% of the system is original," according to Computerworld.
There's no service contract -- he maintains the server with third-party vendors rather than going back to the manufacturer, who says they "probably" still have the parts in stock. And while he believes the server's proprietary operating system hasn't been updated in 15 years, Hogan says "It's been extremely stable."
The server will finally be retired in April, and while the manufacturer says there's some more Stratus servers that have been running for at least 20 years -- this one seems to be the oldest.

21 of 137 comments (clear)

Min score:

Reason:

Sort:

Not "continuously" in the geek sense of the word by suso · 2017-01-28 12:36 · Score: 4, Insightful

"It never shut down on its own because of a fault it couldn't handle," said Hogan. "I can't even think of an instance where we had an unplanned shutdown," he said.

This isn't a server that has had an OS uptime of 24 years. This is a computer that they are still using after 24 years that "hasn't crashed". So what. The Amiga still being used from the 80s was a bigger deal. This article is really just an ad for Stratus.
What? by Anonymous Coward · 2017-01-28 12:40 · Score: 2, Insightful

'"I can't even think of an instance where we had an unplanned shutdown," he said.'
Um... I should hope so, since if it had, it wouldn't have had 24 years of uptime. And no photo of the output of "uptime"? I'm starting to think that they DID shut it down/reboot it many times, but somehow ignore this in the "uptime". Nonsensical article.
1. Re:What? by ShanghaiBill · 2017-01-28 12:51 · Score: 3, Insightful
  
  if it had, it wouldn't have had 24 years of uptime.
  24 years of "uptime" doesn't mean no unplanned shutdowns. It means no shutdowns of any kind. This machine has not done that, and certainly has not been "running continuously".
2. Re:What? by JoeyRox · 2017-01-28 12:54 · Score: 3, Informative
  
  Agreed. Based on one of the embedded articles reference in the original the server had run for at least 4 years continuously, which I still find impressive.
  
  http://www.computerworld.com/article/2550661/data-center/this-server-outlasts-two-presidents.html
BS title by gravewax · 2017-01-28 12:42 · Score: 5, Insightful

it DID NOT run continuously for 24 years. It simply never stopped or restarted without admin intervention, two very very different things.While still impressive it is no where near as impressive as if it had run 24 years continuously.
1. Re:BS title by mysidia · 2017-01-28 14:06 · Score: 3, Informative
  
  Why do you say that's less impressive than running 24 years continuously? Any non-trivial application requires servicing eventually.
  And how will you even be able to tell if that is the case?
  In a virtualization environment; I have servers with 7 year uptimes. Of course, they have occasionally been vMotioned between hosts -- in some cases, servers have been checkpointed, Suspended for a few hours, then resumed in another datacenter without any operating system reboot, so if you go by OS uptime they've been up for 10 years.
  Sometimes a server application can become stalled or break, So it's not provided continuous service, but there's no visible indication on the server, no administrative indication in the log, etc.
2. Re:BS title by freeze128 · 2017-01-29 06:22 · Score: 2
  
  What's REALLY impressive is that it was running during Y2K! It must have been Y2K compliant YEARS before anyone ever even thought to look for that.
24 years without 'unplanned' shutdowns by guruevi · 2017-01-28 12:45 · Score: 3, Interesting

Not quite the same as 24 year uptime. In the same vein, I have a Sun server that is still running since the mid-90's, part of a medical device and used to compile very particular software code for an old small-bore MRI system. We shut it down when the power goes out (very rare), but it's SCSI drives are still good.

--
Custom electronics and digital signage for your business: www.evcircuits.com
Loved the 8086 and 8088 by buss_error · 2017-01-28 12:50 · Score: 5, Interesting

And no, those were the model numbers, not the CPU, which was the M68 series.
About the only thing non-redundant was the clock card. Voice of Experience. The power supplies had built in UPS's. Funny thing on the 808X systems, the power switch had "Off", "On", and past "On" was another state, which I forget what it was called. But if you replaced hardware while running, you'd push it up (it was spring loaded) to get it to IPL the new hardware.
I loved it because you could fold up 24 physical processors into 12, 6 or 4 logical with quorum voting. Get a bad CPU? It wouldn't miss a clock cycle, it's just lock it out and keep going. You could also run it completely unfolded.
These days, folks would say "so what?" - but "back in the day", your PC had a single core. It was a big deal. And even today, if you get a Check CPU, the system crashes on a PC.

--
Necessity is the plea for every infringement of human freedom. It is the argument of tyrants; it is the creed of slaves.
Is it still the same server? by El+Cubano · 2017-01-28 12:54 · Score: 3, Insightful

"Over the years, disk drives, power supplies and some other components have been replaced but Hogan estimates that close to 80% of the system is original," according to Computerworld.
Then is it still considered the same server? https://en.wikipedia.org/wiki/...
Personally, I have a computer that lives in a case I got in 2003. I am on motherboard #4, power supply #2, processor #2, memory modules #6 & #7, hard drives #4 & #5, etc. However, I still consider it to be the same computer. Perhaps there is something psychological about it, but the name (or in this case the case) has a special significance even if all the guts have been swapped out.
1. Re:Is it still the same server? by mspohr · 2017-01-28 13:12 · Score: 3, Insightful
  
  Reminds me of the farmer who had the same ax for 25 years. He'd replaced the handle 4 times and the head twice... but it was the same ax.
  
  --
  I don't read your sig. Why are you reading mine?
2. Re:Is it still the same server? by Mysticalfruit · 2017-01-28 13:28 · Score: 4, Informative
  
  Since this machine is running VOS, and from the '93 time frame it's either an X/AR with i860's or a Continuum with PA-RISC. I'll spitball and say it's a Continuum.
  These machines are not like desktops. The hardware and software is extremely tightly coupled. Multiyear uptimes are not uncommon on Stratus VOS machines.
  
  Full disclosure, I'm a former Stratus Employee.
  
  --
  Yes Francis, the world has gone crazy.
3. Re:Is it still the same server? by wonkey_monkey · 2017-01-28 13:46 · Score: 5, Funny
  
  If you change "Theseus" to "farmer" and "ship" to "axe," is it still the same philosophical problem?
  
  --
  systemd is Roko's Basilisk.
4. Re:Is it still the same server? by mysidia · 2017-01-28 14:09 · Score: 2
  
  If Microsoft says it's still the same computer for Windows OEM licensing purposes, so a new license purchase is not required, then I'll say it's still the same server.
Re: Not "continuously" in the geek sense of the wo by Anonymous Coward · 2017-01-28 13:06 · Score: 4, Funny

Definitely not Hogan's resume.
According to his boss Wilhelm Klink, no one has ever successfully left the company.
Stratus has proprietary redundant *everything*. by Toasterboy · 2017-01-28 13:07 · Score: 5, Informative

Stratus has proprietary redundant *everything* on their machines, and runs in lockstep; they literally have two of everything in there... two motherboards, two cpus, two sets of RAM, etc. If anything weird happens on one side, they fail over to the other motherboard running in lockstep on the other blade in the chassis. Combine that with running an extremely conservative set of drivers that are known stable, and you can get six nines out of the thing. Stratus is typically used for credit card processing and banking applications where it's not ever acceptable to have a machine down for the time it takes to reboot. Really, really, really expensive though. You wouldn't want to use one of these for anything normal.
Re: Not "continuously" in the geek sense of the wo by moofrank · 2017-01-28 13:15 · Score: 5, Interesting

I used to work on Stratus servers, and I think the company was purchased by IBM in the late 90s.
For each running component in the system, there are three physical instances. They use a voting system to drop any disagreement in RAM or the outcome of an instruction. In the 3 years I dealt with them, I never saw a system failure, and the only outages were caused by planned system upgrades. OS stuff. All of the hardware was hit swapped.
These were multimillion dollar machines that basically had the CPU performance of a couple of 68000 CPUs.
I personally witnessed a take out of a Novell 2.x file server which had a 16 year uptime. This was for a school system, and they had forgotten where the file server was. Stuffed in the back of a janitorial closet, and dust covered. That wasn't any sort of fancy hardware.m, but an old microchannel PC.
If you don't need to patch them, computers run for a long time.
Re:BS title - actually, probably true by chromaexcursion · 2017-01-28 13:19 · Score: 5, Insightful

Stratus are an old school redundant parallel architecture. You can take a node off line without taking the system down. Beyond that multiple levels of redundancy with components. Portions of the system have certainly been taken down, but the system as a whole kept running.
No one would consider that kind of architecture now; much too expensive, when other solutions are available now. The key word in the previous sentence is "now". Probably not an ad for Stratus, they don't really exist anymore.
The equivalent now is a server farm. There are systems (server farms) that have been running for over a decade.
Re: Not "continuously" in the geek sense of the wo by Ed+Avis · 2017-01-28 19:45 · Score: 2

"an old microchannel PC" - so relatively fancy in fact. The quality and reliability of IBM's Micro Channel machines (and their small number of licensees) was a notch or two above the typical AT clones of the time. In particular they were designed with some attention to airflow and cooling, rather than just a box with a fan in it, so would be more likely to survive a dust-covered existence.

--
-- Ed Avis ed@membled.com
Re: Not "continuously" in the geek sense of the wo by OneoFamillion · 2017-01-28 22:58 · Score: 2

I personally witnessed a take out of a Novell 2.x file server which had a 16 year uptime. This was for a school system, and they had forgotten where the file server was. Stuffed in the back of a janitorial closet, and dust covered.
Was there a sign on the door saying "Beware of the Leopard?"
Re: Not "continuously" in the geek sense of the wo by Anonymous Coward · 2017-01-28 23:21 · Score: 4, Informative

I used to work at Stratus. Most components were duplicated (disks, IO boards, power supplies). Everything was hot swap. CPU's were duplicated *pairs*. Each pair ran in lockstep, instruction by instruction, and results were compared within a pair. If one pair pf CPUs disagreed (between the CPUs in that pair), but the other pair agreed (between the CPUs in that second pair), the first pair was taken off line and the second pair continued processing.
Each pair of CPUs (and their associated memory) were on a separate board. The faulty board would light a red light, and the admin could pull that faulty board with the system running. A replacement board could be installed, again with the system running transactions. When the new board was installed, a process would start of synchronizing its memory with the content of the memory of the good board. This was done with the system running (processing bank transactions, for example), so the bus bandwidth between the boards had to be fast enough to be able to handle the rate at which the memory contents on the good board was changing. At a certain point the memory state of the two boards would be in sync and the new board would start processing, again in lockstep with the board that had been running all along. The repair process was so easy that one engineering director there had what he called the "mom test" - he have his mom come in and see if she could fix a system that had been forced to throw a fault. Red light? Pull that board out and put a new one in. New board's lights went red-yellow-green (as memory was brought into sync), and you're running fault-tolerant again. Easy peasy. (When the boards failed, they'd "phone home" and Stratus tech support would know there was a fault often before the customer did. They'd ship a replacement part overnight to the site with the bad part. Anyone who worked at Stratus in those days knew the story of the FedEx driver doing a repair for a customer.)
OS upgrades were done by un-synchronizing the OS drives and upgrading one at a time. One would be taken off line for an OS upgrade. The system would have to go through a restart to run on the upgraded OS drive (that would be done in a planned maintenance window), but once it was running, the second OS drive would be mirrored to the running OS drive, at which point the disks were redundant again. The unplanned downtime was zero, the planned downtime was minimized.
It became a challenge for Stratus when CPUs became nondeterministic (instructions wouldn't necessarily process in exactly the same order, making lock-step processing a real problem). At least one CPU architecture transition was driven by that issue. And clearing the heat from 4 cpus in a small space was a thermal challenge. But they were reliable beasts, expensive enough only to be used for workloads involving real money (e.g., financial transactions). Back when the founder (Bill Foster) was still CEO, it was a great place to work. When he left and the MBA's moved in, the company got sold to Ascend Communications, then Ascend got bought by Lucent, the non-telecom part of the business was spun off (as Stratus Technologies), and yes they are still in business.