1 In 3 Data Center Servers Is a Zombie

Money by 14erCleaner · 2015-06-21 02:24 · Score: 4, Insightful

It's not a management issue, either - it's money. People cost more than dead servers.

--
Have you read my blog lately?

Re:Money by ColdWetDog · 2015-06-21 02:49 · Score: 1, Insightful

Money (or lack of it) IS a management issue....
But how hard is it to automate a process that says, in effect, "if no data is going in or out of this server, shut it down"? I suspect that there is a more nefarious purpose here and I propose a corollary to Hanlon's (Heinlein's) Razor:
This is the 21st Century - "You have attributed conditions to villainy that simply result from villainy". Incompetence is for the proletariat - we're the NSA. You're toast.

--
Faster! Faster! Faster would be better!
Re:Money by gstoddart · 2015-06-21 03:40 · Score: 4, Insightful

But how hard is it to automate a process that says, in effect, "if no data is going in or out of this server, shut it down"?
Why should the data center even care.
Most of them are essentially charging rent ... as long as the customer keeps paying, WTF do they care if you actually use them for anything?
This isn't incompetence on behalf of the data centers. Maybe companies who have machines they've lost track of what they're for.

--
Lost at C:>. Found at C.
Re:Money by DigiShaman · 2015-06-21 04:08 · Score: 1

physical space and electricity isn't cheap. And that's part of why virtualization is cost-effective. It allows you to consolidate to fewer pieces of hardware and pull old bare metal servers out that were once provisioned for one server.
If IT management is leasing more than one cage and have a bunch of zombie servers occupying valuable space, heads need to roll. Or at the very least, severe ass-chewing.

--
Life is not for the lazy.
Re:Money by myowntrueself · 2015-06-21 07:37 · Score: 2

Money (or lack of it) IS a management issue....
But how hard is it to automate a process that says, in effect, "if no data is going in or out of this server, shut it down"? I suspect that there is a more nefarious purpose here and I propose a corollary to Hanlon's (Heinlein's) Razor:
This is the 21st Century - "You have attributed conditions to villainy that simply result from villainy". Incompetence is for the proletariat - we're the NSA. You're toast.
If a customer is paying for it to be there and be kept turned on *maybe* that customer has some use for the server oh I don't know maybe its a hot spare in case another server in another data center goes down? So you turn it off, their other server goes down, their service can't fail over and now your customer has a problem.

--
In the free world the media isn't government run; the government is media run.
Re:Money by Anne+Thwacks · 2015-06-21 07:39 · Score: 1

It is a reporting issue: it is perfectly normal.
Some people do not manage remove servers over long periods.
You install three identical servers: one running the public facing web server, one running the database server, connected by a separate, private network. The third one is available for the new version of the software to be installed, and then activated. Once the software is upgraded on all three, you keep it runnning as a hot standby. If reliable service to clients is not worth more than the cost of running a hot standby, you probably would not have any servers in a colo.

--
Sent from my ASR33 using ASCII
Re:Money by phoenix321 · 2015-06-21 08:27 · Score: 1

Where I work, electricity is 0,25ct/kWh and a specialist in IT or law costs 1.000,- EUR/d or more.
Assuming a server we're planning to shut down is rather old, they usually are, so it will probably fail on its own within 3 years, if not much sooner. It is not doing much anymore, so it's sitting at idle, drawing only idle loads. Assuming the idle load of an old server is 100W, how much specialist's time can we allocate to shutting it down?
100W * 8760 h/y * 5y = 2.628 kWh. This will cost us about 657,- EUR or much less if the server fails earlier. So electricity savings alone buy us 5,5 person hours for the specialists. What do we need: 2x1h for two IT guys to check what server are possibly unused, 2h for inquiries and talking to probable (ex-)users and the team that was once responsible for that particular project, 1h for the IT management and 1h for the legal team to give the go-ahead. Costs 750,- = 100,- EUR more than it will ever save.
We still save on cooling, right? Removing 1kWh of input (= heat) requires less than 1kWh for the cooling system. I am have no idea how efficient these things are, but a cheap electric heat pump for heating a small house has an efficiency factor of at least 3, so it moves 3kWh of heat for every 1kWh electricity consumed. Larger and more professional installations will probably be more efficient. So to remove 100W of heat, the cooling system consumes 30W more. So with less cooling, we save another ~200,- EUR over the course of 3 years, which isn't even enough to cover the costs to actually remove the server from the rack, reroute cabling, disassemble the case, destroy the hard drive and dispose of the rest. Costing 250,- EUR, that is 50,- EUR more than it will ever save.
The real savings are in the rack space, depending on the contract and the actual savings of HUs. Assume the price is 40,- EUR/HU/month and we have a 2 HU server. Over 3 years, this saves us 960,- EUR per year or 2.880,- EUR under the most ideal conditions imaginable. (If you rent data center space by the rack, it's ZERO savings, since you're saving nothing unless you get permission to clear out an entire rack, which is not going to happen until the servers burn out by themselves)
So you're Head of IT management for a minute: do you give the order to decommission the server, expending 1.000,- EUR and 1 day of your team today, risking angry users and maybe in one way or another violating a data retention obligation by an obscure law or contract that we just forgot about to save maybe 1.300,- EUR per year or less for the next 3 years or shorter? I wouldn't. I would rather allocate my team and resources on a) making absolutely sure our accounting system keeps running perfectly, since every day of outage there would cost us more than 20.000 EUR in interest and b) that big project X has all the resources it needs to finish on time so the 10 expensive consultants working on it cannot bill more hours and upper management does not need to find a person responsible for that.
Re:Money by plopez · 2015-06-21 16:13 · Score: 1

It's a cheap energy problem. If energy were more expansive it would be worth it to take them offline.

--
putting the 'B' in LGBTQ+
Re:Money by thegarbz · 2015-06-21 20:29 · Score: 2

Depends if you can virtualise then you can over provision. I'd love having multiple people pay rent for the same system.
Re:Money by jabuzz · 2015-06-21 20:43 · Score: 1

You failed to account for the system admin time to keep the server patched and secure. Also you assume that everyone is renting rack space and it is infinite in supply.
These constraints mean that in my experience when a box is no longer doing anything useful it gets issued with a shutdown command to save the power. At this point if it really is required and a user somewhere starts shouting I can power it back up in a couple of minutes.
Then generally six to 12 months later it gets removed from the rack because the space the servers are occupying is required for big project X.
Re:Money by Skapare · 2015-06-21 21:50 · Score: 1

You failed to account for the system admin time to keep the server patched and secure. Also you assume that everyone is renting rack space and it is infinite in supply.
either that or it's running Linux with zero licensing costs to stir management.

--
now we need to go OSS in diesel cars
Re:Money by jbolden · 2015-06-22 01:03 · Score: 2

At this point for almost all companies good quality colo space is infinite. Most times a company isn't even using a meaningful fraction of their colo's space and so they could double or triple instantly without hassle much less an extra 33%. And even if their colo doesn't other's direct connected to it do have extra space... So consider space infinite once you are willing to rent.
That being said, I have problems believing the 1/3rd of severs figures from the article. That's not my experience at all.
Re: Money by funky_vibes · 2015-06-23 04:26 · Score: 1

You can't access your server?
Oh, we shut it down because it didn't receive any connections for a week.

Yes, it's called redundancy by Anonymous Coward · 2015-06-21 02:39 · Score: 1

We need enough servers for peak load, not average load.

Re:Yes, it's called redundancy by prefec2 · 2015-06-21 02:53 · Score: 1

True, but in that case these machines do something sometimes over the year. In a modern data center you would be able to shutdown the servers not used for a longer period and restart them automatically when the load rises. A hardware server start may take ten minutes (if there is not much to synchronize), but as you should know your load profile and use load estimation techniques, you can start the servers in advance. Especially, in context of replication of JVM and .Net components, this should be pretty easy. It is more complicated with databases as they might need some time to synchronize with each other.
Re:Yes, it's called redundancy by petes_PoV · 2015-06-21 03:10 · Score: 4, Informative

In a modern data center you would be able to shutdown the servers not used for a longer period and restart them automatically when the load rises.
Many businesses that rely on servers (i.e. all of them) will be running hot standby systems - ones that can automatically take load if there's a hardware failure or software problem.
One major (world-ranked) international company I consulted at was legally required to have 100% failover capacity - so it was inevitable that they would automatically have 50% of their production servers performing no functions - except for the twice a year when they were "flipped" just to make sure that each set of servers worked as expected.
Although the source paper does specify physical "zombie" servers, if you need failover VMs, the same basis is applied there, too.

--
politicians are like babies' nappies: they should both be changed regularly and for the same reasons
Re:Yes, it's called redundancy by tepples · 2015-06-21 03:21 · Score: 1

One major (world-ranked) international company I consulted at was legally required to have 100% failover capacity - so it was inevitable that they would automatically have 50% of their production servers performing no functions - except for the twice a year when they were "flipped" just to make sure that each set of servers worked as expected.

Why flip them twice a year and not, say, weekly?
Re:Yes, it's called redundancy by iamacat · 2015-06-21 03:25 · Score: 2

A hardware server start may take ten minutes - if it actually comes up successfully. If you are starting a cluster in an emergency outage, you never know how many servers, power supplies and network switches kicked the bucket since you last used them. Plus, your DNS, NFS, db and other dependencies have to be unaffected by the outage and handle the added load of hundreds of servers starting at the same time. If you do a staggered restart of 100 servers in groups of 10, that's an hour and 40 minutes of outage if everything goes without a hitch. Worth the power savings from idle standby?
Re:Yes, it's called redundancy by Iamthecheese · 2015-06-21 03:36 · Score: 2

Because doing it right involves a full fail-over test including transferring loads or test loads, DNS auto-reconfiguration, and possibly even paying extra to bring up extra capacity elsewhere. You need to make sure it happens right when it's needed. Extra paperwork, overtime, it's all in there.

--
If video games influenced behavior the Pac Man generation would be eating pills and running away from their problems.
Re: Yes, it's called redundancy by prefec2 · 2015-06-21 04:45 · Score: 1

You are absolutely right . If these server provide fail over then they must be present. The server start stop thing only applies to load management, e.g. for web shops. As I stated earlier (maybe it was in another post), fail over server as all redundancy related infrastructure are not useless. They serve a purpose. Therefore they cannot be stopped without getting into trouble.
Re:Yes, it's called redundancy by Hadlock · 2015-06-21 07:14 · Score: 1

In our case, about 20% of our servers are outdated and not kept as well maintained, as they used to host some important service, but their new replacement was built and that service was migrated, but nobody's 100% sure if there were any other latent, less important services running on that machine. So it stays on because everybody has more important things to do than find out what else is running on there, and perhaps more importantly, nobody wants to be the guy who shuts down the server that's still running some process someone relies on. So one or two servers of each type stays on, indefinitely, or until extended support finally ends. And yeah we have some physical redundant servers but for the most part everything is a VM now and we just have a one or two redundant VM hosts at our DR site. And an idle old server doesn't consume much of anything besides a gig or two of ram.

--
moox. for a new generation.
Re:Yes, it's called redundancy by mlts · 2015-06-21 07:17 · Score: 1

Some servers (IBMs, HP ProLiants) have decent power management capabilities, so the boxes can stay on and be idle... but consume a relatively small amount of electricity and cooling. Add a SSD for local storage and swap (start the OS or hypervisor and let the SAN take it from there), and even the energy usage of spinning disks can be minimized.
However, with the many ways and layers to do HA, might as well do active/active if possible. On the VMWare side of the house, DRS comes to mind, and it also supports true fault tolerant VMs [1].
Generally, with modern applications (Oracle RAC, for example), active/passive configurations are tending to go the way of the dodo, replaced by active/active, where each server runs about 1/2 the workload. Other applications use a public IP or name, and load-balance, so that a dead box means stuff continues on, but just fewer machines to service the incoming items. Even the IBM mainframes that use Parallel Sysplex are active/active (as both units run all computations in lock-step with one another.)
Of course, there are some applications that are active/passive, such as the legacy IBM HA systems... but those are the exception.
1/3 of all servers not doing things might make sense... in the fact that there are times where a machine decides to eat itself, but the IT staff just doesn't know, or doesn't bother to fix it. For example, some machine that was a development machine for a long-forgotten project that fell out of favor. A lot of companies have internal policies to shut down physical servers, but in the smaller data centers that SMBs are likely using, those policies of decommissioning, or just auditing everything in the data center and yanking anything not being used are not there.
[1]: True fault tolerant, as in the VM actually has two instances of it, each on a different physical machine, so if the primary VM drops, the second takes over in milliseconds. The downside is that the VM can only have one vCPU, and there are a number of other limitations. However, for a licensing server or some other tasks which even the delay to reboot a VM and autostart processes might be too much, it is a useful tool.
Re:Yes, it's called redundancy by bhiestand · 2015-06-21 10:05 · Score: 1

Because doing it right involves a full fail-over test including transferring loads or test loads, DNS auto-reconfiguration, and possibly even paying extra to bring up extra capacity elsewhere. You need to make sure it happens right when it's needed. Extra paperwork, overtime, it's all in there.
If the system is architected well, shouldn't all of those steps be automated... including monitoring and failover success/failure?

--
SWM seeks new sig for a brief fling
Re:Yes, it's called redundancy by thegarbz · 2015-06-21 20:31 · Score: 1

I can imagine that this wouldn't be perfectly smooth. It may be automated but it may not be completely bumpless and I don't think a company would be happy if users see a "scheduled maintenance" sign for 15 min or however long it takes every week.
Re:Yes, it's called redundancy by Thumper_SVX · 2015-06-22 12:59 · Score: 1

If the system is architected well, shouldn't all of those steps be automated... including monitoring and failover success/failure?
In a perfect world, with perfect systems documentation you'd be right. Unfortunately few of us have the pleasure of working in such an environment :)
Re:Yes, it's called redundancy by bhiestand · 2015-06-27 07:03 · Score: 1

I fear you may be right, and that's exactly why they don't do it more often... but I think that also underscores my point a bit. Shouldn't they work to get it to the point where users won't be impacted?
Netflix does this pretty aggressively and users don't seem to notice. Though I realize for most companies I am being very idealistic.

--
SWM seeks new sig for a brief fling

Sounds about right. by Anonymous Coward · 2015-06-21 02:42 · Score: 2, Insightful

One in three people consumes energy and produces nothing interesting.

Re: Sounds about right. by Anonymous Coward · 2015-06-21 03:08 · Score: 2, Funny

Like this comment.
Crap, now it's 2 out of 3.
Re:Sounds about right. by Zontar+The+Mindless · 2015-06-21 05:32 · Score: 1

"Decease" usually implies that the subject didn't survive.

--
Il n'y a pas de Planet B.
Re: Sounds about right. by TheRealHocusLocus · 2015-06-21 23:26 · Score: 1

3 for 3.
One for all, and all for one!
Why is this article (in general) ruffling so many feathers? Because it is a thinly-disguised Malthusian Energy hit-piece specifically targeted at the center of IT's most sacred golden calf, the cloud server industry. The reason that the assumptions made in this study are confusing to many (as in, why are we even on this page? Isn't an overall one-third quiescent portion a sign of a properly engineered critical system?) is that it was not motivated by intelligent resource usage concerns at all.
Energy-environmentalists are like beavers these days. Their teeth are always growing, so they have to gnaw on something. So today they are gnawing on you. These hit pieces are everywhere these days.
Energy usage on every conceivable scale is the 'new' pseudo-environmentalism, and the bar of publishable relevance has been set low so that everyone can participate. So they do. In the olden days you could enjoy your hot shower without guilt and read a book in the brilliance of that 100 watt light bulb... secure in the knowledge that so long as you were part of a team that was striving towards a general goal of greater efficiency on some massive scale, or heading off the problem entirely by developing cheaper and less limited sources of energy, you were a net 'positive' for humanity. And you were.
Somewhere along the line WE let tabloid environmentalism take over, and the scale was tipped towards presumptive guilt. WE let this happen. This is a religious mental disorder for which no actual religion is necessary. Now the merest accusation of wastefulness gains traction because it resonates with that "we're fucking up the planet" meme, and the burden of proof has shifted to YOU as the individual to 'prove' you are a net-positive or at least a neutral. Whether you are conscious of it or not you have bought into an idea of Original Sin.
It's time to reject the notion that energy is somehow is in 'short supply', 'expensive' or 'harmful to the planet'. What is actually in short supply these days is actually the innovative drive to secure better base load energy sources . And what there is a useless dearth of are people striving for (and achieving) ten minutes of fame by pointing out some comparatively tiny 'waste' of energy somewhere, and using that fame (a phenomenon enabled by click-through environmentalists)... to put some one-ten-thousandth of one-millionth of humanity's energy usage 'on trial'. It diverts you from your daily pursuit, whatever that is. It may deliver the illusion that you're making a positive contribution just be reading the stuff. Nope.
Beaver-chewing on specific industries that are built with redundancy and a certain amount of slack for various reasons, many good, is a waste of time. The best design is an over-design after all, and the real world is old-school. Only those working on solving the BIG problems at any given time are our best real hope.
Don't distract those people, where ever they may be.
For all we know there may be just a few left.
The campaign to develop standard plans for a launch vehicle to intercept asteroid threats stands at 174 people and $8,447 raised of $200k with 20 days left to go. If it was some silly little Raspberry Pi thing it would be funded already many times over. And I was hoping this was the Smartest Generation.

--
<blink>down the rabbit hole</blink>

Re:Zombies or fail over? by prefec2 · 2015-06-21 02:49 · Score: 4, Informative

A fail over server is not considered useless. They did not monitor server output and decided then after a period of time that the server were not doing anything. You can infer this knowledge by reading the "paper", as they switched these servers off after identifying them. Switching of fail over servers normally would raise alarms and then you get thrown out ;-) So you could safely assume that they mean unused servers.

Bad Title by seven+of+five · 2015-06-21 02:51 · Score: 4, Informative

Reading the title, my first thought was, cripes, those botnets have taken over everything!

Re:Bad Title by zr · 2015-06-21 04:41 · Score: 1

made you click..
Re:Bad Title by JustAnotherOldGuy · 2015-06-21 04:42 · Score: 2

Lol, I had the same exact thought here.

--
Just cruising through this digital world at 33 1/3 rpm...

Re:Because its impossible to get rackspace by pivot_enabled · 2015-06-21 02:55 · Score: 1

No you get to keep the rack/rack units/cage space once you have acquired it as long as you pay your bill.

3 page "paper" not all that insightful. by Anonymous Coward · 2015-06-21 02:57 · Score: 1

Apparently, the researchers have never heard of business continuity planning. If your primary data center gets knocked offline because your company located it in a hurricane-prone area of the country in order to take advantage of state tax breaks and a cheaper labor force (happens all the time), then you're gonna need another site you can switch over your data/voice traffic instantly when the inevitable hurricane hits. That means maintaining a certain amount of redundant equipment at the failover site that will largely just sit there until its needed for disaster recovery. None of this is mentioned in that paper that reads more like an advertisement for some software that measures energy efficiency.

Re:3 page "paper" not all that insightful. by rubycodez · 2015-06-21 03:50 · Score: 1

Moreover the idle power of systems vs. under normal load can be three to one.
Besides failover there are "swing" servers where virtual machines or services are migrated while upgrades done elsewhere. There are "staging" servers that become busy while new software being rolled out but might otherwise be idle for months.
Note the power draw of an idle server can be a third or less what the normal load is.
The twats that wrote this paper obviously aren't in the business.

Obviously by penguinoid · 2015-06-21 03:01 · Score: 5, Insightful

Those are the servers hosting Slashdot's new "share" button. No one's ever clicked on it.

--
Don't waste your vote! Vote for whoever you want, unless you live in a swing state it won't matter anyways

Re:Obviously by jones_supa · 2015-06-21 03:31 · Score: 2

Also the "Share" links under comments are quite redundant as well IMHO.

They are not consuming 30% of power by iamacat · 2015-06-21 03:16 · Score: 3, Insightful

Modern systems are good at reducing power consumption when idle. It's quite reasonable to have 30% of capacity as spares, reserve for unexpected load, capacity for new apps and so on. They probably consume 3% of the power and nobody is motivated enough to look for more savings. Keeping things completely off is problematic, because you never know how much of the hardware and software will come up in time to handle an emergency unless you run and test it all the time.

There is certainly room for further environmental/financial improvement, but the 30% figure is sensationalized.

Re:They are not consuming 30% of power by Dagger2 · 2015-06-21 03:30 · Score: 2

Maybe. But on the other hand, even active servers spend a lot of their time idle (the paper says server utilization "rarely exceeds 6%"), and I bet a lot of these "comatose" servers are actually long-forgotten old hardware, or machines that nobody can be bothered to decommission -- it's possible that on average they're older than active servers and thus eating a lot more power.
Re:They are not consuming 30% of power by umghhh · 2015-06-21 04:15 · Score: 1

In my previous project we had to save costs so much that we never had an updated document/service showing current booking/usage of our development, test and target servers. The result was that we had to negotiate use of some chains, work in shifts etc while some chains were idling. I have not lasted till the end so I do not know how successful the project was. I guess it was very successful - after they switched off all the machines they had more power than they budgeted i.e. managed to get to profit without customers!
At least on paper...

Chaos Monkey by Netflix by tepples · 2015-06-21 03:19 · Score: 2

I was under the impression that a fail-over server that does not occasionally handle traffic in periodic tests could not be trusted to handle traffic in a true failure situation. Netflix routinely conducts tests of its failover infrastructure, shutting down large blocks of its leased Amazon capacity to make sure the rest of its capacity can keep up.

Re:Chaos Monkey by Netflix by pnutjam · 2015-06-22 04:24 · Score: 1

Whoa....
take your competency and get out of this discussion.

--
Cheap storage VM.

Re:Zombies or fail over? by slydder · 2015-06-21 03:21 · Score: 5, Informative

I've been in IT Management for 15+ and I can assure you it is a good thing you are not in management. I would lose my job in a heartbeat if production server decided to take a dump and I had shut off all our fail-over servers.

It's not just a matter of what those fail-over servers costs. It's the question "Can we afford (financially) to NOT have fail-over servers?". If you stand to lose more due to a production server failure than the cost of running a fail-over for a year then you will not EVER wish to be caught without one.

--
IT Admins Group: Where you decide the content

What are their metrics for being a zombie? by caseih · 2015-06-21 03:32 · Score: 1

How do they judge whether or not a server is contributing useful information? I have two person VPSs out there that do almost nothing on the public internet. They mostly act as a place where I can store data as a form of backup, but also a place I can access when I need it to test programs, get a really fast download, etc. But most of the time these vps's just act as central nodes in my private VPN. So by their definition are my servers in the 1/3 "zombie" serviers? I pay the rent, so to speak, so I'm paying for the energy costs.

Re:Zombies or fail over? by pfleming · 2015-06-21 03:33 · Score: 2

I've been in IT Management for 15+ and I can assure you it is a good thing you are not in management. I would lose my job in a heartbeat if production server decided to take a dump and I had shut off all our fail-over servers.

It's not just a matter of what those fail-over servers costs. It's the question "Can we afford (financially) to NOT have fail-over servers?". If you stand to lose more due to a production server failure than the cost of running a fail-over for a year then you will not EVER wish to be caught without one.

How is it a failover server if no data has traveled into or out of the machine in six months? Wouldn't you want to keep a failover server up to date (data and software updates) so you don't notice the failover? What good is a failover server if you have to load six months of data from tape? The machine could be off until you need it in that case.

Re:Zombies or fail over? by rubycodez · 2015-06-21 03:41 · Score: 2

wrong, you don't understand how it's usually done these days

it only need have the ability to access a SAN where replicated information from the primary server exists

you will not see any data movement to the machine

"massive indictment" by fche · 2015-06-21 03:46 · Score: 1

... of purple prose.

The mere existence of servers on standby is not a problem, let alone a "massive" one.

Pretty close to... by AchilleTalon · 2015-06-21 03:49 · Score: 1

This ratio seems pretty close to the ratio of zombie public servants.

--
Achille Talon
Hop!

Bad terminology by pubwvj · 2015-06-21 03:55 · Score: 4, Insightful

Unfortunate confuse of terminology. Zombie computers is a term also used to mean those taken over by bot nets.

Re:Bad terminology by Anonymous Coward · 2015-06-21 05:26 · Score: 1

Yes, that's exactly what I thought when I read the title. And I sense it was on purpose. Why, otherwise, use "comatose" everywhere else but the title?

Re:Zombies or fail over? by umghhh · 2015-06-21 03:55 · Score: 1

well I had an encounter once with a product that had a feature developers called poor man's redundancy/failover - a doubled system that was neither really redundant nor was it able to failover. Switching off the other machine would indeed save some costs.

Re:Zombies or fail over? by umghhh · 2015-06-21 04:00 · Score: 1

This is probably wrong - assuming a solution you propose is used (which does not have to be the case) you would still want to run some sort of watchdog signalling to be sure failover machine is up and ready. This means effectively you would still have some communication.

Re:Because its impossible to get rackspace by Culture20 · 2015-06-21 04:18 · Score: 1

So you leave zombies on the wire, staking your claim. Then when you need the space, you swap it. Otherwise it's an endless wait for power, cooling, CAB, governance, and all sorts of fail.

No you get to keep the rack/rack units/cage space once you have acquired it as long as you pay your bill.

In a co-location environment, yes. In a standard business environment, the GP's response is true.
Of course, the article's definition of "useful" might not be a sysadmin's definition of "useful". Redundant machines, backup machines, extra capacity machines, dev machines, test machines, support machines, etc. all might be considered non-useful to customers, sales department, HR, or the CEO.

Re:Zombies or fail over? by rubycodez · 2015-06-21 04:19 · Score: 3, Informative

yes, but these researchers were ignoring traffic below a certain threshold.

Re:Zombies or fail over? by msobkow · 2015-06-21 04:34 · Score: 1

No, in the big banks, it's the disk servers that do the mirroring themselves, not the application servers. Except for software updates and configuration changes, the application servers just sit idle at the backup site.

--
I do not fail; I succeed at finding out what does not work.

Re: Zombies or fail over? by prefec2 · 2015-06-21 04:47 · Score: 1

Well if it is broken, fix it. In your case the system was intended to be a redundant server, however, it did not provide real redundancy.

So, an average 1.33 safety factor? by spiritplumber · 2015-06-21 05:21 · Score: 2

A bit low, but reasonable. Try making stuff that goes on ships, there's usually double redundancy AND a completely mechanical system in case everything goes to pot.

--
Liberty - Security - Laziness - Pick any two.

Re:So, an average 1.33 safety factor? by spiritplumber · 2015-06-21 13:31 · Score: 1

I have, it was fun! Nothing mission critical though. The shipboard stuff was critical however.

--
Liberty - Security - Laziness - Pick any two.
Re:So, an average 1.33 safety factor? by dj245 · 2015-06-21 13:48 · Score: 1

Try making stuff that goes on ships
Try making stuff that goes on aircraft...
Aircraft and spaceships have weight restrictions. Weight (and often volume) on a ship are not important. So you can literally have 2 of every system in many cases.

--
Even those who arrange and design shrubberies are under considerable economic stress at this period in history.

A story by CanadianMacFan · 2015-06-21 05:30 · Score: 1

Back when I was a sysadmin for a government department I had been assigned a couple of chassis of HP blades that were bought in one of the famous fiscal year end splurges. For the most part I had no use for them and I didn't even install Linux on them. I think I only ever used a couple of blades and I hated them. It was the first generation and they ran very hot and we had lots of issues with bad RAM. The other three chassis on the rack belonged to the VMWare team and were in heavy use.

Since I had no need for the servers to be on I tried to have the blades turned off but the VMWare team was always turning them on. I would go on occasionally and turn off all of my machines but not long after they would be back on doing absolutely nothing. I never got a good answer from them why my servers had to be turned on wasting electricity and heating up the data centre (especially since special cooling had to be installed for the HP blades, yes they ran that hot).

Re:Zombies or fail over? by msobkow · 2015-06-21 05:39 · Score: 1

You're a particularly special kind of "stupid", aren't you?

The disk servers are mirroring to the backup disk servers, obviously. And I used the term "disk server" because there are several vendors and brands of products available that do the same job.

--
I do not fail; I succeed at finding out what does not work.

Isn't this just on demand processing? by rsilvergun · 2015-06-21 05:41 · Score: 1

The last time Microsoft had a major Xbox Live outage due to high demand they just spun up a bunch of VMs and everything was fine 4 hours later. You keep them idling so that when you need 'em they're ready on a moments notice. Also if you're not Microsoft or Oracle this means you're not paying the licensing costs associated with the software being in production non stop.

--
Hi! I make Firefox Plug-ins. Check 'em out @ https://addons.mozilla.org/en-US/firefox/addon/youtube-mp3-podcaster/

Regulations and data retention by jroysdon · 2015-06-21 06:26 · Score: 1

I know the industry I'm in, we have regulations which require 3+ years of data retention which "isn't providing anything useful" until it is. If we have a legal "issue" then that will extend until the legal issue goes away and the judge says we can destroy data. While we can use archive methods, sometimes the live system is really what is needed to retrieve data. It's better to just keep disks spinning than shut them down and hope they spin back up.

IT has a long tail where I work. Things are planning to last 5 years often have a good deal of life for another 2-5 years (not all, but many). The "usage" of these systems may only be once a month, quarter, or even annually, but it makes more sense than to port data over that doesn't need to be kept in the replacement system.

Many times even when we have an "official" cutoff for a system, we just power it down and let it sit in the rack until the next years' inventory, at which time it is then sent off the the auction yard (sans hard drives) to be bid on by the pallet load.

Re:Zombies or fail over? by mlts · 2015-06-21 06:56 · Score: 1

I can see some machines snoozing for long periods of time, but not 1/3 of a place:

1: Hypervisor-level failover on VMWare or Hyper-V. Generally there are hypervisor updates, such as the recent SSL holes which required a update on ESXi, and other security items on Hyper-V [1]. However, these can sit for a good while untouched, and ready to handle a vMotion punt at a moment's notice.

2: Failure on an active/passive configuration at the DB level. With something like Oracle RAC that costs a lot for licensing, why not just got active/active? In general, the DB application and the OS should be upgraded more often, but I can see someone just tossing

3: IBM PowerHA. Since the virtualization firmware is generally upgraded during the latter months (when new TLs/MLs come out), these machines can probably sit around doing nothing for most of a year.

[1]: I'm meaning "raw" Hyper-V servers that are not part of a Windows Server OS install. Neglecting a Windows Server OS install is just asking for the box to become a "client" of another sort.

Re:Zombies or fail over? by mlts · 2015-06-21 07:00 · Score: 1

Grr, quick addendum on #2: I can see a firm just tossing the OS and application on a machine and walking off, but in general that isn't a good practice.

Re:Zombies or fail over? by myowntrueself · 2015-06-21 07:55 · Score: 1

I've been in IT Management for 15+ and I can assure you it is a good thing you are not in management. I would lose my job in a heartbeat if production server decided to take a dump and I had shut off all our fail-over servers.

It's not just a matter of what those fail-over servers costs. It's the question "Can we afford (financially) to NOT have fail-over servers?". If you stand to lose more due to a production server failure than the cost of running a fail-over for a year then you will not EVER wish to be caught without one.

I'm with you in general but it can be incredibly difficult to get an estimate from business intelligence on how much you actually stand to lose per hour of downtime.

--
In the free world the media isn't government run; the government is media run.

Clickbait by buckfeta2014 · 2015-06-21 08:07 · Score: 1

People pay for servers all the time and never use them. If they paid for the year, then why should the hoster care?

--
Buck Feta. You know what to do.

Re:Clickbait by guruevi · 2015-06-21 13:00 · Score: 1

Sometimes the hosting company doesn't keep track of it either. I recently was involved in a datacenter decommission (20yo) and at the end of the day a shit ton of hardware was still happily humming along, nobody claimed it nor did anyone keep track of whose it was (most likely they once did but moved asset tracking platforms which missed certain things).
About a decade ago another company I worked for had a similar thing where we expanded the datacenter and started keeping accurate track of new assets. Again, bunches of devices were left over with no traces at all in asset or order tracking and entire racks of cable runs had to be cut out with chainsaws (after the power was removed). We had a notorious salesperson that would often allow 'good' clients in a pinch (often due to administrative and physical disaster at neighboring centers) to get colocation and even dedicated servers, rush built/installed after hours and installed with "we'll do the paperwork on Monday".

--
Custom electronics and digital signage for your business: www.evcircuits.com

Causes of hording. by Tablizer · 2015-06-21 17:03 · Score: 1

From personal experience, the bureaucracy of our org makes it that procurement of servers is so difficult that section managers tend to horde them when they get them.

I'm hoping virtualization will improve this situation, but something tells me it will only create different problems. The bureaucratic culture usually invents new ways to foul up new tools.

--
Table-ized A.I.

Re:Causes of hording. by Skapare · 2015-06-21 21:40 · Score: 1

then they will hoard hordes of instances.

--
now we need to go OSS in diesel cars
Re:Causes of hording. by jbolden · 2015-06-22 01:09 · Score: 1

One way to handle that is to not own your infrastructure and just rent month to month from the vendor who provides a pool of servers. What you are likely facing is the problem of how to prevent the administrative cost from going above X% by preventing the IT administrative cost from going about Y% by slowing down acquisitions... Better yet is just to guarantee Y and save the labor.
Re:Causes of hording. by Tablizer · 2015-06-22 06:01 · Score: 1

For security reasons, the org in question wants mostly internal servers. But if they ran it kind of like a vendor, it may work in that that each section has to pay for any server instance it's using, through the budgeting process. But, the org in question would probably bungle that too.

--
Table-ized A.I.
Re:Causes of hording. by jbolden · 2015-06-22 06:34 · Score: 1

The department of defense runs servers out of house. Lockheed Martin runs a cloud provider. Many of the country's banks handle it. There is no question you can buy better security than any company has internally.
As for running an internal cloud that's pretty easy and they could ask a vendor to run the financial it while keeping all the servers physically on their prem.

turn them into mail servers ... by Skapare · 2015-06-21 21:34 · Score: 2

turn them into mail servers ... then spammers will keep them active.

--
now we need to go OSS in diesel cars

Re:Easy to fix by Skapare · 2015-06-21 21:42 · Score: 1

don't let management know you are doing this. they will say you are wasting time.

--
now we need to go OSS in diesel cars

Yahoo in Northern VA by kriston · 2015-06-22 03:15 · Score: 1

There's this rumor that when Yahoo expanded its Lockport "chicken coop" data centers in upstate NY they vacated at least two large data centers in Northern VA and because the lease isn't up for another two years they have been mostly empty ever since.

Yet, Yahoo is saving lots of money by doing this.

--

Kriston

Sorry, but that sounds like ignorance by whitroth · 2015-06-22 04:57 · Score: 1

You do not spec for "average" usage; you spec for *max*. You also have to spec for how many machines (when we're talking about thousands, or tens of thousands of servers) are going to fail today, to be picked up by the "zombie" machines that are, in fact, hot spares.

And then there's the Big Events, like the shooting in Charleston, or when the SCOTUS announces about gay marriage or the ACA - how many of those "zombie" machines are going to go live to help carry the traffic load?

mark

Before IPMI... by PapaSurf · 2015-06-22 06:10 · Score: 1

It cost $50 to get the data chimp to power a server on.

Slashdot Mirror

1 In 3 Data Center Servers Is a Zombie

80 of 107 comments (clear)