Server Runs Continuously For 24 Years (computerworld.com)

Not "continuously" in the geek sense of the word by suso · 2017-01-28 12:36 · Score: 4, Insightful

"It never shut down on its own because of a fault it couldn't handle," said Hogan. "I can't even think of an instance where we had an unplanned shutdown," he said.

This isn't a server that has had an OS uptime of 24 years. This is a computer that they are still using after 24 years that "hasn't crashed". So what. The Amiga still being used from the 80s was a bigger deal. This article is really just an ad for Stratus.

What? by Anonymous Coward · 2017-01-28 12:40 · Score: 2, Insightful

'"I can't even think of an instance where we had an unplanned shutdown," he said.'

Um... I should hope so, since if it had, it wouldn't have had 24 years of uptime. And no photo of the output of "uptime"? I'm starting to think that they DID shut it down/reboot it many times, but somehow ignore this in the "uptime". Nonsensical article.

Re:What? by ShanghaiBill · 2017-01-28 12:51 · Score: 3, Insightful

if it had, it wouldn't have had 24 years of uptime.
24 years of "uptime" doesn't mean no unplanned shutdowns. It means no shutdowns of any kind. This machine has not done that, and certainly has not been "running continuously".
Re:What? by JoeyRox · 2017-01-28 12:54 · Score: 3, Informative

Agreed. Based on one of the embedded articles reference in the original the server had run for at least 4 years continuously, which I still find impressive.

http://www.computerworld.com/article/2550661/data-center/this-server-outlasts-two-presidents.html
Re:What? by arth1 · 2017-01-28 14:16 · Score: 1

Agreed. Based on one of the embedded articles reference in the original the server had run for at least 4 years continuously, which I still find impressive.
4 years is 1461 days, which isn't that impressive.
A couple of prod servers I use:
21:09:09 up 1507 days, 10:21, 0 users, load average: 0.00, 0.01, 0.05 21:10:00 up 1651 days, 9:03, 0 users, load average: 0.02, 0.33, 0.32
And there are some irreplaceable Unix boxen from the mid 90s too, but they don't have as high uptimes, due to needs for repairs and parts cannibalization.
Re:What? by rickb928 · 2017-01-28 15:22 · Score: 1

Our software used to rely on Stratus servers until we were forced to rely on eBay for spares... before that we had no downtime recorded for 8 years before i came, and none for 7 years after.
I managed Novell servers with uptimes over 3000 days, which for Novell servers wasn't at all unusual. For most of these we didn't rely on the IDE drivers for data, so we avoided the clock problems by kicking the driver in the head.

--
deleting the extra space after periods so i can stay relevant, yeah.
Re: What? by rickb928 · 2017-01-31 03:21 · Score: 1

That's funny. No, really, you're funny.

--
deleting the extra space after periods so i can stay relevant, yeah.

BS title by gravewax · 2017-01-28 12:42 · Score: 5, Insightful

it DID NOT run continuously for 24 years. It simply never stopped or restarted without admin intervention, two very very different things.While still impressive it is no where near as impressive as if it had run 24 years continuously.

Re:BS title by Anonymous Coward · 2017-01-28 13:55 · Score: 1

For those who want to see an actual awesome uptime, there was a Reddit thread yesterday about a Cisco 2500 that has been up 20 years, without any interruption or downtime, since January 29, 1997.
Re:BS title by mysidia · 2017-01-28 14:06 · Score: 3, Informative

Why do you say that's less impressive than running 24 years continuously? Any non-trivial application requires servicing eventually.
And how will you even be able to tell if that is the case?
In a virtualization environment; I have servers with 7 year uptimes. Of course, they have occasionally been vMotioned between hosts -- in some cases, servers have been checkpointed, Suspended for a few hours, then resumed in another datacenter without any operating system reboot, so if you go by OS uptime they've been up for 10 years.
Sometimes a server application can become stalled or break, So it's not provided continuous service, but there's no visible indication on the server, no administrative indication in the log, etc.
Re:BS title by bytesex · 2017-01-29 04:21 · Score: 1

Is it also pwned by every major government on the planet?

--
Religion is what happens when nature strikes and groupthink goes wrong.
Re:BS title by freeze128 · 2017-01-29 06:22 · Score: 2

What's REALLY impressive is that it was running during Y2K! It must have been Y2K compliant YEARS before anyone ever even thought to look for that.
Re:BS title by skids · 2017-01-29 08:43 · Score: 1

Depending on what features you are using the attack surface can be very small on these, so even if you don't have an out-of-band management system (or no management system, if you don't need to change the config enough for running to the closet with a console cable and a laptop to be a chore) they can be pretty much hack-proof.

--
Someone had to do it.
Re:BS title by mysidia · 2017-01-29 12:14 · Score: 1

Yeepp.... Probably systems that weren't doing any substantial amount of processing or I/O.
Re:BS title by TechnoJoe · 2017-01-30 07:28 · Score: 1

As a biological machine, I have an uptime of 35 years. While I have been put into sleep mode from time to time, I have never been shutdown. In fact, one of the disadvantages of my particular architecture is that once shutdown, I cannot be restarted.
Re:BS title by herbierobinson · 2017-02-02 01:30 · Score: 1

Yes, it was Y2K compliant. So was Linux, Mac OS and just about every other OS on the planet EXCEPT Windows.

--
An engineer who ran for Congress. http://herbrobinson.us

24 years without 'unplanned' shutdowns by guruevi · 2017-01-28 12:45 · Score: 3, Interesting

Not quite the same as 24 year uptime. In the same vein, I have a Sun server that is still running since the mid-90's, part of a medical device and used to compile very particular software code for an old small-bore MRI system. We shut it down when the power goes out (very rare), but it's SCSI drives are still good.

--
Custom electronics and digital signage for your business: www.evcircuits.com

Re:24 years without 'unplanned' shutdowns by jandrese · 2017-01-28 16:30 · Score: 1

While I've never worked directly with Stratus boxes, my understanding is that the machines have redundant and hot-swappable everything, so it's possible to completely replace half of the box while the other half is serving normally, and then switch it over and do the same on the other half. No unplanned outage might well mean that it never stopped doing whatever it is that the server is tasked with, even when parts of it had to be replaced or upgraded. Even the OS all the way down to the kernel can be upgraded without so much as a stall in application service.

But I also heard that they pay for that capability by being ridiculously expensive and slow.

--

I read the internet for the articles.
Re:24 years without 'unplanned' shutdowns by guruevi · 2017-01-29 04:52 · Score: 1

Sun Servers did as well, they were one of the first machines besides mainframes that even had hot-swappable CPU and RAM, Solaris kernels could be upgraded without a reboot.

--
Custom electronics and digital signage for your business: www.evcircuits.com

lol by sunking2 · 2017-01-28 12:49 · Score: 1

IT application architect meet sanitation engineer.

Loved the 8086 and 8088 by buss_error · 2017-01-28 12:50 · Score: 5, Interesting

And no, those were the model numbers, not the CPU, which was the M68 series.

About the only thing non-redundant was the clock card. Voice of Experience. The power supplies had built in UPS's. Funny thing on the 808X systems, the power switch had "Off", "On", and past "On" was another state, which I forget what it was called. But if you replaced hardware while running, you'd push it up (it was spring loaded) to get it to IPL the new hardware.

I loved it because you could fold up 24 physical processors into 12, 6 or 4 logical with quorum voting. Get a bad CPU? It wouldn't miss a clock cycle, it's just lock it out and keep going. You could also run it completely unfolded.

These days, folks would say "so what?" - but "back in the day", your PC had a single core. It was a big deal. And even today, if you get a Check CPU, the system crashes on a PC.

--
Necessity is the plea for every infringement of human freedom. It is the argument of tyrants; it is the creed of slaves.

Re:Not "continuously" in the geek sense of the wor by fustakrakich · 2017-01-28 12:54 · Score: 1

This article is really just an ad for Stratus.

Or Phil Hogan's résumé

--
“He’s not deformed, he’s just drunk!”

Is it still the same server? by El+Cubano · 2017-01-28 12:54 · Score: 3, Insightful

"Over the years, disk drives, power supplies and some other components have been replaced but Hogan estimates that close to 80% of the system is original," according to Computerworld.

Then is it still considered the same server? https://en.wikipedia.org/wiki/...

Personally, I have a computer that lives in a case I got in 2003. I am on motherboard #4, power supply #2, processor #2, memory modules #6 & #7, hard drives #4 & #5, etc. However, I still consider it to be the same computer. Perhaps there is something psychological about it, but the name (or in this case the case) has a special significance even if all the guts have been swapped out.

Re:Is it still the same server? by mspohr · 2017-01-28 13:12 · Score: 3, Insightful

Reminds me of the farmer who had the same ax for 25 years. He'd replaced the handle 4 times and the head twice... but it was the same ax.

--
I don't read your sig. Why are you reading mine?
Re:Is it still the same server? by Mysticalfruit · 2017-01-28 13:28 · Score: 4, Informative

Since this machine is running VOS, and from the '93 time frame it's either an X/AR with i860's or a Continuum with PA-RISC. I'll spitball and say it's a Continuum.
These machines are not like desktops. The hardware and software is extremely tightly coupled. Multiyear uptimes are not uncommon on Stratus VOS machines.

Full disclosure, I'm a former Stratus Employee.

--
Yes Francis, the world has gone crazy.
Re:Is it still the same server? by wonkey_monkey · 2017-01-28 13:46 · Score: 5, Funny

If you change "Theseus" to "farmer" and "ship" to "axe," is it still the same philosophical problem?

--
systemd is Roko's Basilisk.
Re:Is it still the same server? by mysidia · 2017-01-28 14:09 · Score: 2

If Microsoft says it's still the same computer for Windows OEM licensing purposes, so a new license purchase is not required, then I'll say it's still the same server.
Re:Is it still the same server? by arth1 · 2017-01-28 14:40 · Score: 1

Personally, I have a computer that lives in a case I got in 2003. I am on motherboard #4, power supply #2, processor #2, memory modules #6 & #7, hard drives #4 & #5, etc. However, I still consider it to be the same computer
I'm sure a lot of us have systems that have been upgraded so many times that the only part being the same is the case (Lian-Li cases last forever) and name.
However, my main server, serving DNS, DHCP, SMTP, POP3S, HTTP, NTP, Samba, NFS and NIS, is still the same as it was in 2001. PIII-S, 512 MB RAM, and while hard drives have been replaced over the years, it's still running well, with an OS up-to-date as per today. It's power frugal enough that it doesn't even have a CPU fan, despite the system having run overclocked from 133 to 140 MHz for 16 years now.
They don't make them like they used to.
Re:Is it still the same server? by the_humeister · 2017-01-28 16:48 · Score: 1

This is more a philosophical question along the lines of ship of Theseus.
All the components of your cells are replaced about every 7 years. Are you still the same person every 7 years?
Re:Is it still the same server? by Agripa · 2017-01-30 05:22 · Score: 1

Personally, I have a computer that lives in a case I got in 2003. I am on motherboard #4, power supply #2, processor #2, memory modules #6 & #7, hard drives #4 & #5, etc.
My FreeBSD router has been running since about 2000 with a Celeron 300A and Supermicro P6SBA Revision 2.0 motherboard with 384M of ECC SDRAM.
I upgraded the storage from a 600MB hard drive to compact flash a couple years ago for faster booting and lower power. The power supply has been replaced once.
Other than power outages, occasional software updates, and routine maintenance for cleaning dust and maintaining the fans, the only failure has been when the ice machine upstairs sprung a leak which managed to drip right into the case and onto the motherboard. The system was out of operation for about a day to dry out but a backup system was built in about 15 minutes running the same BSD image on a Pentium 4 box.
Re:Is it still the same server? by Agripa · 2017-01-30 06:09 · Score: 1

I do not know if he changed it but I have one of my great grandfather's ball and peen hammers and I have changed the handle once where I also refinished it and painted it.
Re:Is it still the same server? by maestroX · 2017-01-30 10:41 · Score: 1

> Full disclosure, I'm a former Stratus Employee. Ahem, full disclosure, what's yer uptime?
Re:Is it still the same server? by michael_wojcik · 2017-02-03 11:21 · Score: 1

Who can say? I used the axe to chop the ship apart and built a temple with a golden pavilion from the timber.
Re:Is it still the same server? by michael_wojcik · 2017-02-03 11:22 · Score: 1

Did that axe you're grinding belong to your grandfather?

Re: Not "continuously" in the geek sense of the wo by Anonymous Coward · 2017-01-28 13:06 · Score: 4, Funny

Definitely not Hogan's resume.

According to his boss Wilhelm Klink, no one has ever successfully left the company.

Stratus has proprietary redundant *everything*. by Toasterboy · 2017-01-28 13:07 · Score: 5, Informative

Stratus has proprietary redundant *everything* on their machines, and runs in lockstep; they literally have two of everything in there... two motherboards, two cpus, two sets of RAM, etc. If anything weird happens on one side, they fail over to the other motherboard running in lockstep on the other blade in the chassis. Combine that with running an extremely conservative set of drivers that are known stable, and you can get six nines out of the thing. Stratus is typically used for credit card processing and banking applications where it's not ever acceptable to have a machine down for the time it takes to reboot. Really, really, really expensive though. You wouldn't want to use one of these for anything normal.

Re:Stratus has proprietary redundant *everything*. by c · 2017-01-29 02:22 · Score: 1

Really, really, really expensive though. You wouldn't want to use one of these for anything normal.
Environment Canada used to run a similar architecture from Tandem for processing weather data. They wanted the "real timey" aspects of how it dealt with data, but the extreme data processing redundancy was a bit of a problem ("don't lose my money" is massive overkill for a temperature value that's updated at least hourly) and they ended up doing some deep O/S development to cut 150 disk writes per data element down to something sane.
They were solid, though; only time I ever saw one go completely dark was when someone did a generator test and a UPS battery bank exploded.

--
Log in or piss off.
Re:Stratus has proprietary redundant *everything*. by herbierobinson · 2017-02-02 01:41 · Score: 1

A lot of manufacturing shops use them to run production lines (where the computer crashing can cause the entire line to shut down).
They are also part of the 911 system.
The other reason one occasionally wants voting hardware is to detect failures. If the numbers you are crunching are really important, you want to be sure you get the right answer. I certainly hope that the people designing self-driving cars are using voting computers, redundant sensors, and redundant actuators. I don't want a glitch in some microprocessor to send me into a head-on collision! [I wouldn't use a Stratus computer for that, but I would build voting into the CPU chip -- in the correct way so that memory is also voted.]

--
An engineer who ran for Congress. http://herbrobinson.us

"character-driven interface" by TeknoHog · 2017-01-28 13:13 · Score: 1

FTA:

Even though the system has a character-driven interface, similar to an old green screen system, the users "like the reliability of it, and the screens are actually pretty simple," said Hogan.

Is there any other way to run a serious server?

--
Escher was the first MC and Giger invented the HR department.

Re: Not "continuously" in the geek sense of the wo by moofrank · 2017-01-28 13:15 · Score: 5, Interesting

I used to work on Stratus servers, and I think the company was purchased by IBM in the late 90s.

For each running component in the system, there are three physical instances. They use a voting system to drop any disagreement in RAM or the outcome of an instruction. In the 3 years I dealt with them, I never saw a system failure, and the only outages were caused by planned system upgrades. OS stuff. All of the hardware was hit swapped.

These were multimillion dollar machines that basically had the CPU performance of a couple of 68000 CPUs.

I personally witnessed a take out of a Novell 2.x file server which had a 16 year uptime. This was for a school system, and they had forgotten where the file server was. Stuffed in the back of a janitorial closet, and dust covered. That wasn't any sort of fancy hardware.m, but an old microchannel PC.

If you don't need to patch them, computers run for a long time.

Re: Not "continuously" in the geek sense of the wo by Anonymous Coward · 2017-01-28 13:18 · Score: 1

Definitely not Hogan's resume.

According to his boss Wilhelm Klink, no one has ever successfully left the company.

I get your joke, but you just confused a large number of young people. Who all need to get off my lawn.

Re:BS title - actually, probably true by chromaexcursion · 2017-01-28 13:19 · Score: 5, Insightful

Stratus are an old school redundant parallel architecture. You can take a node off line without taking the system down. Beyond that multiple levels of redundancy with components. Portions of the system have certainly been taken down, but the system as a whole kept running.
No one would consider that kind of architecture now; much too expensive, when other solutions are available now. The key word in the previous sentence is "now". Probably not an ad for Stratus, they don't really exist anymore.
The equivalent now is a server farm. There are systems (server farms) that have been running for over a decade.

For some perspective by Anonymous Coward · 2017-01-28 13:24 · Score: 1

Windows 95 had a bug that made it crash when the uptime hit 2^32 miliseconds, or 49.7 days. Since Windows usually crashed much sooner anyway, it took Microsoft years to notice that bug.

Re:For some perspective by Anonymous Coward · 2017-01-28 13:57 · Score: 1

For some more perspective, parent post is informative because everyone familiar with Win95 has already left /. in disgust.

Novell... by perotbot · 2017-01-28 13:31 · Score: 1

Not proprietary, though netware is no longer supported, there's not a lot to go wrong, and some boxes had epic uptimes, as in never died, never rebooted. We had one that the only reason it completely went down was a catastrophic power loss (both PDUs lost power at the same time) . Its uptime was over a decade with over 50 users still accessing every day. All that being said, anything that's still running 24 years after initial boot is impressive and worthy of note. NOTHING running windows would have done that. Perhaps something running on an IBM "z" series could. Given that my IT career began in 91, I'm lucky to be running this long without a reboot.

--
~corporate tool, but employed~

Meh, there's a lot of these out there... by Phydeaux · 2017-01-28 13:51 · Score: 1

I built a webserver on a PowerMac 7200 in 1996 and the machine's been running 24/7/365 since (barring power outages longer than the UPS battery, etc). Not a single component has been replaced, the OS (System 9.2.1) never updated, the software (WebSTAR) only patched until the company went out of business. I'd be willing to bet that there's a lot of servers like this still floating around universities and school districts...

Re:BS title - actually, probably true by Kjella · 2017-01-28 13:56 · Score: 1

Don't mainframes kinda do the same thing today? I know they're not exactly mainstream but not everyhing is well suited to being done by a farm, where you really need serialization and global consistency. If you're Facebook or Google the page doesn't have to perfectly reflect changes someone did 0.01 second ago. If you're doing bank transactions or booking tickets then you really need to know if there's still money in the account or the seat is still free. NoSQL is great if you don't need all the guarantees of ACID SQL. Sadly some people think it's the "next gen" and can replace everything relational databases does today.

--
Live today, because you never know what tomorrow brings

Re: Not "continuously" in the geek sense of the wo by Lisias · 2017-01-28 14:24 · Score: 1

Definitely not Hogan's resume.

According to his boss Wilhelm Klink, no one has ever successfully left the company.

alive.

--
Lisias@Earth.SolarSystem.OrionArm.MilkyWay.Local.Virgo.Universe.org

Re:Running AmigaOS? by arth1 · 2017-01-28 14:50 · Score: 1

A1000 here, although the monitor isn't working, so there's some jury rigging to get it to work with a more modern CRT.

But in terms of longevity of a computing device in regular use, my slipstick from the 60s is still in my pocket every day. Plastic coated pearwood is durable.

Telephone switches had uptimes of decades by davidwr · 2017-01-28 15:13 · Score: 1

Old-school pre-1990s telephone switches - you know, those nearly-building-sized things that kept thousands or tens of thousands of phones in a city working - had uptimes measured in decades.

Short of either a scheduled replacement or a physical disaster, they kept running and running and running.

--
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.

Re: It should have been retired 15 years ago. by rickb928 · 2017-01-28 15:31 · Score: 1

0. Many Stratus servers are single-purpose. They didn't need updates once the systems were stsble.

1. Stratus OS were never subsceptible to viruses, botnets, etc. They aren't Windows, or anything like it.

--
deleting the extra space after periods so i can stay relevant, yeah.

Re: Not "continuously" in the geek sense of the w by Anonymous Coward · 2017-01-28 15:46 · Score: 1

Don't try to "add to the joke" if you don't know what you're talking about.

On the show, the POWs were able to come & go as they pleased due to Klink's incompetence.

Yep. Alternative title: server doesn't run Windows by raymorris · 2017-01-28 16:09 · Score: 1

Yeah the system in the article has been down for maintenance for various reasons. It hasn't CRASHED since it was put into service. A computer that doesn't crash? That's impressive - if it runs Windows. I've been running servers since the mid 1990s and I'd say MOST of them have never crashed.

Re:Not "continuously" in the geek sense of the wor by Tough+Love · 2017-01-28 16:56 · Score: 1

This isn't a server that has had an OS uptime of 24 years. This is a computer that they are still using after 24 years that "hasn't crashed".

Not even that. My understanding is, when Stratus fails over processors it is just a quiet reboot. Didn't turn off the power, yay.

In the good old days the mainframe boys would hot-swap mainframes by running the new processor in lock-step with the old one, even across vendors (It is rumoured that Amdahl made some sales this way.) The Voyager computers have been running for 40 years.

--
When all you have is a hammer, every problem starts to look like a thumb.

Re: Not "continuously" in the geek sense of the wo by Tough+Love · 2017-01-28 17:06 · Score: 1

IBM was a reseller for Stratus but Stratus is still an independent company.

--
When all you have is a hammer, every problem starts to look like a thumb.

Re:Not "continuously" in the geek sense of the wor by Tough+Love · 2017-01-28 17:30 · Score: 1

OK, digging into that... Stratus is doing some weird and cool stuff. Not running processors in cycle-for-cycle lockstep like the mainframe guys (at least, not with their x86 offerings where Intel would never permit the level of systems integration that would be required) but at the memory access level instead, as in, if two processors are running the some code on the same data, then they must access memory with the same pattern. Hard to see how that could be made to work without some kind of hypervisor, which they most probably use. Cache effects would be a nightmare.

--
When all you have is a hammer, every problem starts to look like a thumb.

Re: Not "continuously" in the geek sense of the wo by Anonymous Coward · 2017-01-28 17:52 · Score: 1

I had a NT 3.51 Server that had an uptime of 8+ years. I was on a project to convert all of the machines from Token Ring to ethernet and saved this one to the last just because I couldn't bare to be the one to take it down. The NT 4.0 servers we had we were lucky to have an uptime of 30 days.

What words mean by ChrisMaple · 2017-01-28 19:20 · Score: 1

And while he believes the server's proprietary operating system hasn't been updated in 15 years, Hogan says "It's been extremely stable."

Stable means unchanged. If it hasn't been updated in 15 years, of course it's stable.

--
Contribute to civilization: ari.aynrand.org/donate

Re: Not "continuously" in the geek sense of the wo by Ed+Avis · 2017-01-28 19:45 · Score: 2

"an old microchannel PC" - so relatively fancy in fact. The quality and reliability of IBM's Micro Channel machines (and their small number of licensees) was a notch or two above the typical AT clones of the time. In particular they were designed with some attention to airflow and cooling, rather than just a box with a fan in it, so would be more likely to survive a dust-covered existence.

--
-- Ed Avis ed@membled.com

Re: Not "continuously" in the geek sense of the wo by OneoFamillion · 2017-01-28 22:58 · Score: 2

I personally witnessed a take out of a Novell 2.x file server which had a 16 year uptime. This was for a school system, and they had forgotten where the file server was. Stuffed in the back of a janitorial closet, and dust covered.

Was there a sign on the door saying "Beware of the Leopard?"

Re:Running AmigaOS? by Bert64 · 2017-01-28 23:05 · Score: 1

Likely AmigaOS will have been rebooted many many times in all those years, and chances are you've replaced the battery and possibly the motherboard capacitors too in all that time.

--
http://spamdecoy.net - free throwaway anonymous email - avoid spam!

Re:Replaced power supplies? by driblio · 2017-01-28 23:16 · Score: 1

Have you ever used a real server?

Re: Not "continuously" in the geek sense of the wo by Anonymous Coward · 2017-01-28 23:21 · Score: 4, Informative

I used to work at Stratus. Most components were duplicated (disks, IO boards, power supplies). Everything was hot swap. CPU's were duplicated *pairs*. Each pair ran in lockstep, instruction by instruction, and results were compared within a pair. If one pair pf CPUs disagreed (between the CPUs in that pair), but the other pair agreed (between the CPUs in that second pair), the first pair was taken off line and the second pair continued processing.

Each pair of CPUs (and their associated memory) were on a separate board. The faulty board would light a red light, and the admin could pull that faulty board with the system running. A replacement board could be installed, again with the system running transactions. When the new board was installed, a process would start of synchronizing its memory with the content of the memory of the good board. This was done with the system running (processing bank transactions, for example), so the bus bandwidth between the boards had to be fast enough to be able to handle the rate at which the memory contents on the good board was changing. At a certain point the memory state of the two boards would be in sync and the new board would start processing, again in lockstep with the board that had been running all along. The repair process was so easy that one engineering director there had what he called the "mom test" - he have his mom come in and see if she could fix a system that had been forced to throw a fault. Red light? Pull that board out and put a new one in. New board's lights went red-yellow-green (as memory was brought into sync), and you're running fault-tolerant again. Easy peasy. (When the boards failed, they'd "phone home" and Stratus tech support would know there was a fault often before the customer did. They'd ship a replacement part overnight to the site with the bad part. Anyone who worked at Stratus in those days knew the story of the FedEx driver doing a repair for a customer.)

OS upgrades were done by un-synchronizing the OS drives and upgrading one at a time. One would be taken off line for an OS upgrade. The system would have to go through a restart to run on the upgraded OS drive (that would be done in a planned maintenance window), but once it was running, the second OS drive would be mirrored to the running OS drive, at which point the disks were redundant again. The unplanned downtime was zero, the planned downtime was minimized.

It became a challenge for Stratus when CPUs became nondeterministic (instructions wouldn't necessarily process in exactly the same order, making lock-step processing a real problem). At least one CPU architecture transition was driven by that issue. And clearing the heat from 4 cpus in a small space was a thermal challenge. But they were reliable beasts, expensive enough only to be used for workloads involving real money (e.g., financial transactions). Back when the founder (Bill Foster) was still CEO, it was a great place to work. When he left and the MBA's moved in, the company got sold to Ascend Communications, then Ascend got bought by Lucent, the non-telecom part of the business was spun off (as Stratus Technologies), and yes they are still in business.

Happy with less by UnixUnix · 2017-01-29 05:01 · Score: 1

Two years untouched, I returned and saw its screen had gone blank. Linux distro, booted from DVD, in-RAM, quite a chore to bring everything back after shutdown and reboot. But it wasn't needed: ctrl-alt-F2, init 2 to be sure, init 5 and presto the desktop was back. Singing along. Whatever the issue it didn't take my old Linux down. Happy for life's little joys.

Re:Nobel NetWare by PPH · 2017-01-29 05:27 · Score: 1

NetWare ... they could still ping it on the network.

They didn't use enough concrete.

--
Have gnu, will travel.

Re:Dumb by fisted · 2017-01-29 06:39 · Score: 1

It's stupid to upgrade when there's no reason to upgrade.

Doing that would be a sign of a shitty sysadmin, dear PFY.

--
CLI paste? paste.pr0.tips!

Re:Not "continuously" in the geek sense of the wor by Rich.Miller.6 · 2017-01-29 07:52 · Score: 1

I'm not as surprised as other people making comments about this article. In 1993, I ported Raima Data Manager (at the time, a network-model DMBS running 12,000 different commercial applications (you never heard of it, because it simply worked) to Stratus VOS. The manager of the Stratus office in Bellevue, WA gave me a tour. In the glass-walled machine room, he opened up the Stratus machine running the office - the center of the company's northwest US sales operation - and pulled out a board. I looked out of the glass walls in horror. After a few seconds, the manager pushed the board back in and said, "Look at this." So I looked at the console. The messages, to the best I can remember, said, "Board 9: CPU. Removed." "Board 9: CPU: inserted... testing... OK... Online." The salesmen in their offices never even looked up.

Two other things struck me at the time as being radically different from what I was used to. First, during the port I accidentally used the debugger command to step *into* a low-level C-language routine. The message that came back let me know that the source code lived on development disk 1 of XXX machine in the Los Angeles office, and because I didn't have permissions allowing me to see that code it wasn't going to show it to me. Wow - seamless wide-area networking in 1993. Second, I learned that Stratus VOS only supported a (highly-capable) Stratus terminal, an that my programs had to work with that and nothing else. I asked, what if I'm running a Wyse 50 Whizbang 7? The manager said that I'd simply register that terminal and its characteristics with the operating system - there was an easy way to do that - and the operating system would take care of any necessary translations. Wow again: something Unix got wrong: of course the operating system should take care of supporting different kinds of terminals! (Just like disk drives: my programs should not know or care about low-level details like how to write to a disk or terminal.) Finding something Unix got wrong is rare indeed.

Stratus VOS (a descendant of Multics, cf. Unix) got a surprising number of things right. Having a server actually running "next to forever" doesn't surprise me.

Re: Not "continuously" in the geek sense of the wo by skids · 2017-01-29 08:34 · Score: 1

I personally witnessed a take out of a Novell 2.x file server which had a 16 year uptime. This was for a school system, and they had forgotten where the file server was. Stuffed in the back of a janitorial closet, and dust covered.

I'll go one better on that... I know of one that was up for a couple of decades and finally failed, and when they went looking for it, they had to break through some drywall into an odd corner of a closet where it had accidentally been sealed off by construction contractors.

--
Someone had to do it.

Re: It should have been retired 15 years ago. by skids · 2017-01-29 08:48 · Score: 1

There is no way to make a 100% secure networked operating system

Got a mathematical proof for that statement? Because that's what's requited for such a claim.

--
Someone had to do it.

Re: It should have been retired 15 years ago. by skids · 2017-01-29 08:50 · Score: 1

Out of morbid curiosity, what qualifies as "supported"?

--
Someone had to do it.

Re: Not "continuously" in the geek sense of the wo by LinuxIsGarbage · 2017-01-29 11:35 · Score: 1

I'll go one better on that... I know of one that was up for a couple of decades and finally failed, and when they went looking for it, they had to break through some drywall into an odd corner of a closet where it had accidentally been sealed off by construction contractors.

You mean the one everyone read about 16 years ago
Slashdot Article

First Rule of Server Uptime... by kimgkimg · 2017-01-30 05:29 · Score: 1

... is not to talk about server uptime. To anyone. Now you've just jinxed it.

Re: Not "continuously" in the geek sense of the wo by abmw · 2017-01-30 06:34 · Score: 1

Disssssssss.......misssssssed!!

Re: Not "continuously" in the geek sense of the wo by sr180 · 2017-01-30 16:38 · Score: 1

When I managed NT 4sp6 servers - as soon as Task manager showed around 500 idle hours, it was time for a reboot, because magical shit would start happening..

--
In Soviet Russia the insensitive clod is YOU!

Re: It should have been retired 15 years ago. by rickb928 · 2017-01-31 03:26 · Score: 1

Note that quantum encryption is being challenged. I'm pretty sure proving it's not possible is evident. Now the question you should have asked was if successful attacks on systems could be completed in a meaningful period of time... Which is almost a stupid question.

So far, however, absolute security seems unattainable in practice. And those who are successful probably don't disclose it, so we don't know...

--
deleting the extra space after periods so i can stay relevant, yeah.

Re: Not "continuously" in the geek sense of the wo by herbierobinson · 2017-02-02 01:26 · Score: 1

Given that the version of the OS that supports that machine hasn't been updated for more than a decade, that machine probably has been running continuously for a lot more than 10 years.

If it was bought in 1993, the CPUs were probably PA-RISC, not 68K. I can't tell for sure, because the picture was not a Stratus machine.

The current generation of CPUs are functionally dual socket zeons in 4U rack enclosures. The heat envelope allows for up to 24 cores. Operating systems supported are VOS (the original proprietary OS -- which is still being developed), Windows Server, Linux and VMWare.

--
An engineer who ran for Congress. http://herbrobinson.us

Re:BS title - actually, probably true by michael_wojcik · 2017-02-03 11:19 · Score: 1

"not exactly mainstream?" Unless you're talking about real industries, like insurance, I suppose. Sure, Facebook and the like can get by with never-consistent kill-them-all-sort-them-later distributed farms of commodity PCs, but there are still some businesses which need a modicum of reliability in their data processing.

There are probably on the order of 10000 System z installations. Yes, that's a small number relative to x64, but it's still very much "mainstream", particularly when you look at how they're used.

In practice, System z machines these days are pretty much all running a bare-metal hypervisor (derived from IBM's VM OS, which was the first commercial virtual-machine OS) hosting various "LPARs" (logical partitions, i.e. virtual machines). The OSes in those LPARs - which may be zOS, zLinux, z/VM, z/VSE, TPF, and possibly others - will be "IPL'd" (rebooted) frequently or infrequently, to handle those configuration changes and patches that require it, depending on how the organization likes to schedule such things. The hardware itself and the hypervisor are likely to stay up for years at a time. Hardware upgrades are probably the most common reason for a hardware shutdown.

That said, z isn't a completely fault-tolerant architecture like Stratus or Tandem (now part of HPE). There are various fault-tolerant options for z machines, but I'm not aware of any configuration that's like Stratus' "open the cabinet and yank out a CPU card" level of tolerance.

Slashdot Mirror

Server Runs Continuously For 24 Years (computerworld.com)

77 of 137 comments (clear)