1 In 3 Data Center Servers Is a Zombie
dcblogs writes with these snippets from a ComputerWorld story about a study that says nearly a third of all data-center servers are are comatose ("using energy but delivering no useful information"). What's remarkable is this percentage hasn't changed since 2008, when a separate study showed the same thing. ... A server is considered comatose if it hasn't done anything for at least six months. The high number of such servers "is a massive indictment of how data centers are managed and operated," said Jonathan Koomey, a research fellow at Stanford University, who has done data center energy research for the U.S. Environmental Protection Agency. "It's not a technical issue as much as a management issue."
It's not a management issue, either - it's money. People cost more than dead servers.
Have you read my blog lately?
We need enough servers for peak load, not average load.
One in three people consumes energy and produces nothing interesting.
A fail over server is not considered useless. They did not monitor server output and decided then after a period of time that the server were not doing anything. You can infer this knowledge by reading the "paper", as they switched these servers off after identifying them. Switching of fail over servers normally would raise alarms and then you get thrown out ;-) So you could safely assume that they mean unused servers.
Reading the title, my first thought was, cripes, those botnets have taken over everything!
No you get to keep the rack/rack units/cage space once you have acquired it as long as you pay your bill.
Apparently, the researchers have never heard of business continuity planning. If your primary data center gets knocked offline because your company located it in a hurricane-prone area of the country in order to take advantage of state tax breaks and a cheaper labor force (happens all the time), then you're gonna need another site you can switch over your data/voice traffic instantly when the inevitable hurricane hits. That means maintaining a certain amount of redundant equipment at the failover site that will largely just sit there until its needed for disaster recovery. None of this is mentioned in that paper that reads more like an advertisement for some software that measures energy efficiency.
Those are the servers hosting Slashdot's new "share" button. No one's ever clicked on it.
Don't waste your vote! Vote for whoever you want, unless you live in a swing state it won't matter anyways
Modern systems are good at reducing power consumption when idle. It's quite reasonable to have 30% of capacity as spares, reserve for unexpected load, capacity for new apps and so on. They probably consume 3% of the power and nobody is motivated enough to look for more savings. Keeping things completely off is problematic, because you never know how much of the hardware and software will come up in time to handle an emergency unless you run and test it all the time.
There is certainly room for further environmental/financial improvement, but the 30% figure is sensationalized.
I was under the impression that a fail-over server that does not occasionally handle traffic in periodic tests could not be trusted to handle traffic in a true failure situation. Netflix routinely conducts tests of its failover infrastructure, shutting down large blocks of its leased Amazon capacity to make sure the rest of its capacity can keep up.
I've been in IT Management for 15+ and I can assure you it is a good thing you are not in management. I would lose my job in a heartbeat if production server decided to take a dump and I had shut off all our fail-over servers.
It's not just a matter of what those fail-over servers costs. It's the question "Can we afford (financially) to NOT have fail-over servers?". If you stand to lose more due to a production server failure than the cost of running a fail-over for a year then you will not EVER wish to be caught without one.
IT Admins Group: Where you decide the content
How do they judge whether or not a server is contributing useful information? I have two person VPSs out there that do almost nothing on the public internet. They mostly act as a place where I can store data as a form of backup, but also a place I can access when I need it to test programs, get a really fast download, etc. But most of the time these vps's just act as central nodes in my private VPN. So by their definition are my servers in the 1/3 "zombie" serviers? I pay the rent, so to speak, so I'm paying for the energy costs.
I've been in IT Management for 15+ and I can assure you it is a good thing you are not in management. I would lose my job in a heartbeat if production server decided to take a dump and I had shut off all our fail-over servers.
It's not just a matter of what those fail-over servers costs. It's the question "Can we afford (financially) to NOT have fail-over servers?". If you stand to lose more due to a production server failure than the cost of running a fail-over for a year then you will not EVER wish to be caught without one.
How is it a failover server if no data has traveled into or out of the machine in six months? Wouldn't you want to keep a failover server up to date (data and software updates) so you don't notice the failover? What good is a failover server if you have to load six months of data from tape? The machine could be off until you need it in that case.
wrong, you don't understand how it's usually done these days
it only need have the ability to access a SAN where replicated information from the primary server exists
you will not see any data movement to the machine
... of purple prose.
The mere existence of servers on standby is not a problem, let alone a "massive" one.
This ratio seems pretty close to the ratio of zombie public servants.
Achille Talon
Hop!
Unfortunate confuse of terminology. Zombie computers is a term also used to mean those taken over by bot nets.
well I had an encounter once with a product that had a feature developers called poor man's redundancy/failover - a doubled system that was neither really redundant nor was it able to failover. Switching off the other machine would indeed save some costs.
This is probably wrong - assuming a solution you propose is used (which does not have to be the case) you would still want to run some sort of watchdog signalling to be sure failover machine is up and ready. This means effectively you would still have some communication.
So you leave zombies on the wire, staking your claim. Then when you need the space, you swap it. Otherwise it's an endless wait for power, cooling, CAB, governance, and all sorts of fail.
No you get to keep the rack/rack units/cage space once you have acquired it as long as you pay your bill.
In a co-location environment, yes. In a standard business environment, the GP's response is true.
Of course, the article's definition of "useful" might not be a sysadmin's definition of "useful". Redundant machines, backup machines, extra capacity machines, dev machines, test machines, support machines, etc. all might be considered non-useful to customers, sales department, HR, or the CEO.
yes, but these researchers were ignoring traffic below a certain threshold.
No, in the big banks, it's the disk servers that do the mirroring themselves, not the application servers. Except for software updates and configuration changes, the application servers just sit idle at the backup site.
I do not fail; I succeed at finding out what does not work.
Well if it is broken, fix it. In your case the system was intended to be a redundant server, however, it did not provide real redundancy.
A bit low, but reasonable. Try making stuff that goes on ships, there's usually double redundancy AND a completely mechanical system in case everything goes to pot.
Liberty - Security - Laziness - Pick any two.
Back when I was a sysadmin for a government department I had been assigned a couple of chassis of HP blades that were bought in one of the famous fiscal year end splurges. For the most part I had no use for them and I didn't even install Linux on them. I think I only ever used a couple of blades and I hated them. It was the first generation and they ran very hot and we had lots of issues with bad RAM. The other three chassis on the rack belonged to the VMWare team and were in heavy use.
Since I had no need for the servers to be on I tried to have the blades turned off but the VMWare team was always turning them on. I would go on occasionally and turn off all of my machines but not long after they would be back on doing absolutely nothing. I never got a good answer from them why my servers had to be turned on wasting electricity and heating up the data centre (especially since special cooling had to be installed for the HP blades, yes they ran that hot).
You're a particularly special kind of "stupid", aren't you?
The disk servers are mirroring to the backup disk servers, obviously. And I used the term "disk server" because there are several vendors and brands of products available that do the same job.
I do not fail; I succeed at finding out what does not work.
The last time Microsoft had a major Xbox Live outage due to high demand they just spun up a bunch of VMs and everything was fine 4 hours later. You keep them idling so that when you need 'em they're ready on a moments notice. Also if you're not Microsoft or Oracle this means you're not paying the licensing costs associated with the software being in production non stop.
Hi! I make Firefox Plug-ins. Check 'em out @ https://addons.mozilla.org/en-US/firefox/addon/youtube-mp3-podcaster/
I know the industry I'm in, we have regulations which require 3+ years of data retention which "isn't providing anything useful" until it is. If we have a legal "issue" then that will extend until the legal issue goes away and the judge says we can destroy data. While we can use archive methods, sometimes the live system is really what is needed to retrieve data. It's better to just keep disks spinning than shut them down and hope they spin back up.
IT has a long tail where I work. Things are planning to last 5 years often have a good deal of life for another 2-5 years (not all, but many). The "usage" of these systems may only be once a month, quarter, or even annually, but it makes more sense than to port data over that doesn't need to be kept in the replacement system.
Many times even when we have an "official" cutoff for a system, we just power it down and let it sit in the rack until the next years' inventory, at which time it is then sent off the the auction yard (sans hard drives) to be bid on by the pallet load.
I can see some machines snoozing for long periods of time, but not 1/3 of a place:
1: Hypervisor-level failover on VMWare or Hyper-V. Generally there are hypervisor updates, such as the recent SSL holes which required a update on ESXi, and other security items on Hyper-V [1]. However, these can sit for a good while untouched, and ready to handle a vMotion punt at a moment's notice.
2: Failure on an active/passive configuration at the DB level. With something like Oracle RAC that costs a lot for licensing, why not just got active/active? In general, the DB application and the OS should be upgraded more often, but I can see someone just tossing
3: IBM PowerHA. Since the virtualization firmware is generally upgraded during the latter months (when new TLs/MLs come out), these machines can probably sit around doing nothing for most of a year.
[1]: I'm meaning "raw" Hyper-V servers that are not part of a Windows Server OS install. Neglecting a Windows Server OS install is just asking for the box to become a "client" of another sort.
Grr, quick addendum on #2: I can see a firm just tossing the OS and application on a machine and walking off, but in general that isn't a good practice.
I've been in IT Management for 15+ and I can assure you it is a good thing you are not in management. I would lose my job in a heartbeat if production server decided to take a dump and I had shut off all our fail-over servers.
It's not just a matter of what those fail-over servers costs. It's the question "Can we afford (financially) to NOT have fail-over servers?". If you stand to lose more due to a production server failure than the cost of running a fail-over for a year then you will not EVER wish to be caught without one.
I'm with you in general but it can be incredibly difficult to get an estimate from business intelligence on how much you actually stand to lose per hour of downtime.
In the free world the media isn't government run; the government is media run.
People pay for servers all the time and never use them. If they paid for the year, then why should the hoster care?
Buck Feta. You know what to do.
From personal experience, the bureaucracy of our org makes it that procurement of servers is so difficult that section managers tend to horde them when they get them.
I'm hoping virtualization will improve this situation, but something tells me it will only create different problems. The bureaucratic culture usually invents new ways to foul up new tools.
Table-ized A.I.
turn them into mail servers ... then spammers will keep them active.
now we need to go OSS in diesel cars
don't let management know you are doing this. they will say you are wasting time.
now we need to go OSS in diesel cars
There's this rumor that when Yahoo expanded its Lockport "chicken coop" data centers in upstate NY they vacated at least two large data centers in Northern VA and because the lease isn't up for another two years they have been mostly empty ever since.
Yet, Yahoo is saving lots of money by doing this.
Kriston
You do not spec for "average" usage; you spec for *max*. You also have to spec for how many machines (when we're talking about thousands, or tens of thousands of servers) are going to fail today, to be picked up by the "zombie" machines that are, in fact, hot spares.
And then there's the Big Events, like the shooting in Charleston, or when the SCOTUS announces about gay marriage or the ACA - how many of those "zombie" machines are going to go live to help carry the traffic load?
mark
It cost $50 to get the data chimp to power a server on.