AOL Creates Fully Automated Data Center
miller60 writes with an except from a Data Center Knowledge article: "AOL has begun operations at a new data center that will be completely unmanned, with all monitoring and management being handled remotely. The new 'lights out' facility is part of a broader updating of AOL infrastructure that leverages virtualization and modular design to quickly deploy and manage server capacity. 'These changes have not been easy,' AOL's Mike Manos writes in a blog post about the new facility. 'It's always culturally tough being open to fundamentally changing business as usual.'"
Mike Manos's weblog post provides a look into AOL's internal infrastructure. It's easy to forget that AOL had to tackle scaling to tens of thousands of servers over a decade before the term Cloud was even coined.
How long will it take for an engineer to get there to replace a card or server?
Is now hands-off?
So they have a fully automated unmanned data center... For their fully unused unpopulated services?
WIN!
Wow ... we were doing this 10 years ago before virtual systems were commonplace, 'computers on a card' where just coming out. Data center was 90 miles away. All monitoring and managing was done remotely. The only time we ever went to physical data center was if a physical piece of hardware had to be swapped out. Multiple IP addresses were configured per server so any single server one one tier could act as a fail over for another one on the same tier. We used firewalls to automate failovers, hardware failures were too infrequent to spend money on other methods. We could rebuild Sun servers in 10 minutes from saved images. All software updates were scripted and automated. A separate maintenance network was maintained. Logins were not allowed except on the maintenance network, and all ports where shutdown except for ssh. A remote serial interface provided hard-console access to each machine if the networks to a system wasn't available.
......
Yawn
I rarely read replies, it's my opinion and if you thought about your opinion a little more, I'm OK with that.
Seriously. AOL keeps my relative's PC experience safe; which, generally, keeps them from bugging me for help. :-)
Seriously though, most telcomm operations operate like this. Their switching centers are all fully automated and unmanned, and usually in the basement of some non descript building. This is nothing new.
Feed the need: Digitaladdiction.net
.. but there last geek quite, so now the data center must fend for itself.
With a lot of my friends believing their code monkey jobs were a dead end, and becoming IT/network admins etc.. I wonder how cloud computing etc will affect the market? Will we see more of these people switching back to software engineering?
AOL still exists? Wow. Yeah ok I guess this is the result of years of beancounter thinking - the expensive part of running the service and the reason they were losing money was the IT staff, huh? Glad I closed my CompuServe account before giving these guys any money.
Seven puppies were harmed during the making of this post.
Seem like it may take time for any one to come to the site for any thing vs have a few people on site to get to stuff quicker.
I'm still expecting their datacenters to be unmanned and using zero electricity soon. I"m surprised they have lasted this long.
I'm from Europe. What is AOL again? And what is its/their significance in 2011/2012 anyway?
- Jesper
My security clearance is so high I have to kill myself if I remember I have it...
In other news, the rest of AOL is expected to go "lights out" any time now.
Hey, Windows users, there is no such thing as "forward" slash, there is only slash and backslash.
AOL? Who they?
But I can't resist.
...In Soviet Russia, remote hands are YOURS!
It's pretty easy to automate a bunch of off switches. ;)
Everybody's data center is fully automated until they decide to make a change they hadn't thought of in the first place. Then you have unauthorized cross-connects running everywhere and desktops running RHEL2 for that one app the developers insists won't run on a VM hidden behind racks so the DC owners won't find them.
The new data center with 0 head count matches nicely the AOL user base with 0 head count!
sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
One - If there is redundancy and virtualization, AOL can certainly keep services running while a tech goes in, maybe once a week, and swaps out the failed blades that have already beeen remotely disabled and their usual services relocated. this is not a problem. Our outfit here has a lights-out facility that sees a tech maybe every few weeks, and other than that a janitor keeps the dust bunnies at bay and makes sure the locks work daily. And yes, they've asked him to flip power switches and tell them what color the lights were. He's gotten used to this. that center doesn't have state-of-the-art stuff in it, either.
Two - Didn't AOL run on a mainframe (or more than one) in the 90s? It predated anything useful, even the Web I think. Netscape was being launched in 1998, Berners-Lee was making a NeXT browser in 1990, and AOL for Windows existed in 1991. Mosaic and Lynx were out in 1993. AOL sure didn't need any PC infrastructure, it predated even Trumpet Winsock, I think, and Linux. I don't think I could have surfed the Web in 1991 with a Windows machine, but I could use AOL.
deleting the extra space after periods so i can stay relevant, yeah.
Oh yeah, to house all the dial-up modems...
I hate being bipolar; it's awesome!
I didn't know AOL even still existed!
Finding God in a Dog
AOWho?
The Kruger Dunning explains most post on
n/t /obligatory
In a world of the blind, the one-eyed man is king--and the two-eyed man is a heretic.
how does redundancy help you when the main power switch goes down / on fire and there is no one there. Let's see firemen make a big mess and no is there to start the rebuild or it may just do a safe shutdown just to send some out just to find out you need to call in this other guy to fix the switch or generator.
how does redundancy help you when the main power switch goes down / on fire and there is no one there
If you are a big enough operation, you have redundancy at the data center level. i.e. you can lose an entire data center and have no loss of service on your production applications. Other than a possible speed/performance degradation, your average customer has no knowledge that anything bad has happened.
At least that way they won't need "heroic support"
lucm, indeed.
What are they doing nowadays that requires multiple servers?
... I say FUUUUUUUUUUUUUUUuuuuuu...
The eternal struggle of good vs. evil begins within one's self.
....wait for it .... Smynet! (Someone typoed)
...and Daddy Warbucks got some dough - in a manner of speaking, as it were, etc und so weiter.
This is why you have a duplicate data center in another city that is kept in standby and is just sitting there ready to take over. (Actually, you normally have a mix of services active at either location.)
The company I work for makes telecom equipment, and supporting geo redundancy is a fairly key requirement for some major customers.
To start chewing through wires, causing power outages, starting fires, pooping in the mailbox, that kind of stuff.
Vote monkeys into Congress. They are cheaper and more trustworthy.
cool story bro.
"Redundancy" was not meant for the staff.
how does redundancy help you when the main power switch goes down
The natural-gas feed backup generators automatically kick on, supplying more than enough power to run the facility for an indefinite period of time.
on fire
Halon suppressant systems. Puts out the fire, doesn't do anything to the equipment. It'll suffocate humans, however, so it's best used in an unmanned room.
Let's see firemen make a big mess
They are trained on how to deal with Halon systems and in specific fires, etc. in datacenters and other electric-heavy facilities.
nd no is there to start the rebuild
Within a few minutes of the alarms tripping, someone in a central monitoring center will dispatch repair techs to the site. There will be a spares deport located somewhere closeby with replacement equipment.
find out you need to call in this other guy to fix the switch or generator.
Generators and electrical are usually contracted out to some kind of local company which specializes in that stuff.
Un-manned simply means there isn't someone there on a daily basis. I'm not sure why this is being talked up like it's some kind of new concept, since tens of thousands of companies all around the planet have been doing this for many, many years.
One of the early search engines, I think Infoseek, worked this way. Machines were installed in blocks of 100 (this was before 1U servers) and never replaced individually. Failed machines were powered off remotely. When some fraction of the block had failed, about 20%, the whole cluster was replaced.
There's a lot to be said for this. You have less maintenance-induced failure. Operating costs are low.
...over a decade before the term Cloud was even coined.
You mean back when it was called 'grid'?
What they did:
* Modularize/Standardize Infrastructure, e.g. storage & computing power
* Build provisioning systems
* Virtualize everything
When they say that they are flexible, they mean that they have a lot of dark hardware lying around.
Who replaces the dead hard drives?
This speaks to me of an underutilized data center. Real measured disk MTBF rates are in the order of 7 years (to quote one randomly googled study on disk MTBF "between 2-13% disk replacement per year"). In a data center with 10,000 servers assuming just a single disk per server you should be replacing 4 disks a day. In truth I'd expect for a hardworking DC to have >> than a single disk per server on average. Along with all the other intermittent failures of RAM, CPU, cabling, network devices etc... a surprisingly high percentage of a data center is offline at any given time and the longer you leave to fix it, the higher the percentage. The reason you have staff on the ground is to keep that number low... unless you aren't working the DC hard in which case it doesn't matter if you've got 25% of the cell offline waiting for someone to come in and do repairs.
I suspect they may be talking about much smaller facilities than the large players (Google, Amazon, Microsoft) who have genuinely large DCs. If they're mostly talking about content delivery then they don't require lots of machines- especially if cache hit rates are high. With a smaller facility there isn't the same compelling logic of failure rates, utilization and sheer capital to warrant onsite staff. It would make sense for them to have small, dispersed facilities to give them a CDN-like low latency delivery mechanism, better redundancy in case of failures etc... so long as they can architect their software to support it (which in many case may just be adding a 'remote' proxy layer or the like).