AOL Creates Fully Automated Data Center

← Back to Stories (view on slashdot.org)

AOL Creates Fully Automated Data Center

Posted by Unknown on Tuesday October 11, 2011 @10:18AM from the tomorrow-system-architect-automates-himself dept.

miller60 writes with an except from a Data Center Knowledge article: "AOL has begun operations at a new data center that will be completely unmanned, with all monitoring and management being handled remotely. The new 'lights out' facility is part of a broader updating of AOL infrastructure that leverages virtualization and modular design to quickly deploy and manage server capacity. 'These changes have not been easy,' AOL's Mike Manos writes in a blog post about the new facility. 'It's always culturally tough being open to fundamentally changing business as usual.'" Mike Manos's weblog post provides a look into AOL's internal infrastructure. It's easy to forget that AOL had to tackle scaling to tens of thousands of servers over a decade before the term Cloud was even coined.

12 of 123 comments (clear)

Min score:

Reason:

Sort:

Wow .. how '2000'ish by johnlcallaway · 2011-10-11 10:27 · Score: 3, Informative

Wow ... we were doing this 10 years ago before virtual systems were commonplace, 'computers on a card' where just coming out. Data center was 90 miles away. All monitoring and managing was done remotely. The only time we ever went to physical data center was if a physical piece of hardware had to be swapped out. Multiple IP addresses were configured per server so any single server one one tier could act as a fail over for another one on the same tier. We used firewalls to automate failovers, hardware failures were too infrequent to spend money on other methods. We could rebuild Sun servers in 10 minutes from saved images. All software updates were scripted and automated. A separate maintenance network was maintained. Logins were not allowed except on the maintenance network, and all ports where shutdown except for ssh. A remote serial interface provided hard-console access to each machine if the networks to a system wasn't available.

Yawn ......

--
I rarely read replies, it's my opinion and if you thought about your opinion a little more, I'm OK with that.
1. Re:Wow .. how '2000'ish by johnlcallaway · 2011-10-11 10:32 · Score: 3, Informative
  
  Thanks for not pointing to the actual blog in the original article. So what they are really blogging is their ability to move an entire DATA CENTER without having to send people to do it. Other than .. you know .. install the hardware to start with.
  
  Never mind........
  
  --
  I rarely read replies, it's my opinion and if you thought about your opinion a little more, I'm OK with that.
Who? by Jailbrekr · 2011-10-11 10:27 · Score: 3, Insightful

Seriously though, most telcomm operations operate like this. Their switching centers are all fully automated and unmanned, and usually in the basement of some non descript building. This is nothing new.

--
Feed the need: Digitaladdiction.net
What by Dunbal · 2011-10-11 10:28 · Score: 3, Funny

AOL still exists? Wow. Yeah ok I guess this is the result of years of beancounter thinking - the expensive part of running the service and the reason they were losing money was the IT staff, huh? Glad I closed my CompuServe account before giving these guys any money.

--
Seven puppies were harmed during the making of this post.
1. Re:What by billcopc · 2011-10-11 11:03 · Score: 4, Informative
  
  How often does shit hit the fan in that sort of environment ?
  As a hybrid techie who does a lot of hardware work, I would much rather go in once a month, fix a batch of issues in one visit, collect my fat cheque and go back to the pub, than spend 40+ hours a week playing Bejeweled, waiting for stuff to break.
  I would expect AOL's strategy to greatly reduce costs, because that $15/hr rack monkey costs a lot more than $15/hr in the end. They have benefits, you have to "manage" them, they need human comforts like bathrooms, cleaning, seating, heating/air, lunch room. From an efficiency standpoint, the contractor route is more efficient in both money and time.
  
  --
  -Billco, Fnarg.com
Re:What is AOL again. ..? by SwedishChef · 2011-10-11 10:37 · Score: 3, Funny

I thought everyone knew... AOL is the Internet.

--
No one ever had to evacuate a city because the solar panels broke!
Two points. by rickb928 · 2011-10-11 10:58 · Score: 3, Insightful

One - If there is redundancy and virtualization, AOL can certainly keep services running while a tech goes in, maybe once a week, and swaps out the failed blades that have already beeen remotely disabled and their usual services relocated. this is not a problem. Our outfit here has a lights-out facility that sees a tech maybe every few weeks, and other than that a janitor keeps the dust bunnies at bay and makes sure the locks work daily. And yes, they've asked him to flip power switches and tell them what color the lights were. He's gotten used to this. that center doesn't have state-of-the-art stuff in it, either.
Two - Didn't AOL run on a mainframe (or more than one) in the 90s? It predated anything useful, even the Web I think. Netscape was being launched in 1998, Berners-Lee was making a NeXT browser in 1990, and AOL for Windows existed in 1991. Mosaic and Lynx were out in 1993. AOL sure didn't need any PC infrastructure, it predated even Trumpet Winsock, I think, and Linux. I don't think I could have surfed the Web in 1991 with a Windows machine, but I could use AOL.

--
deleting the extra space after periods so i can stay relevant, yeah.
Re:Offtopic, but IT workers? by aix+tom · 2011-10-11 11:00 · Score: 3, Insightful

The software still needs to be written. The programs still need to be run somewhere.
Technically not much has changed. The "Cloud" is still made up of servers that have to be administered. The main effect is that the IT and network admins will have to keep up with technology, especially the new virtualization layers between the hardware and the running application. But keeping up to date has always been a part of working in IT.
Re:So it will take ages for a fix by Martin+Blank · 2011-10-11 11:04 · Score: 4, Interesting

One of the major backbone providers has a lights-out data center not far from my work. I know a guy who has a hosting business there, and he's shown me around to the limits of his access. There is no one on-site from the company or its contractors--not even a security guard. They have biometrics plus PINs for access; it's laced with low-light/IR cameras (it wouldn't surprise me to learn they have microphones); it has motion detectors in case the cameras miss something; and the redundancy is incredible. They maintain contracts with local electricians, plumbers, and a few technical companies should a blade burn out. They manage the entire thing from a few states over, and as of a couple of years ago almost all of their data centers had been converted to run this way. Savings were good, something like a million dollars per DC per year even as unanticipated downtime decreased.
I looked at it and saw the future of IT. I wasn't sure if I was more impressed or scared.

--
You can never go home again... but I guess you can shop there.
Re:Uh.... by silverglade00 · 2011-10-11 11:15 · Score: 3, Funny

Nobody will be there to see Skynet become self-aware. What... you thought the end of humanity wouldn't come from AOL?
Re:So it will take ages for a fix by Zocalo · 2011-10-11 11:29 · Score: 3, Insightful

Who cares? I'm guessing you don't have much experience of server clusters but generally, long before you get to the kind of scale we are talking about here, you start treating servers in the same way you might treat HDDs in a RAID array. When one fails, other servers in the cluster pick up the slack until you can either repair the broken unit or you simply remote install the appropriate image onto a standby server and bring that up until an engineer physically goes to site. Handling of the data is somewhat critical though; should a server die you ideally need to be able to resume what it was working on seemlessly and without causing any data corruption; think transaction based DB queries and timeout/retry.

If you have enough spare servers and you can easily get by with engineers only needing to go on site once a month or so, assuming you get your MTBF calculations right that is. There's a good white paper by Google on how 200,000 hr MTBF hard drive failure rates equate to drive failures every few hours when you have a few 100k HDs.

--
UNIX? They're not even circumcised! Savages!
Re:no security or maintenance? by EdIII · 2011-10-11 12:34 · Score: 3, Insightful

The whole idea is not to need to get to stuff quicker at all.
If you are:
1) Completely virtualized.
2) Use power circuits that are monitored for load, on a battery back up, power conditioners, and diesel fuel generators for local utility backup.
3) Use management devices to control all your bare metal as if you are standing there, complete with USB connected storage per device that you can swap out the iso for.
4) Have redundancy in your virtualization setup that allows you to have high availability, live migration, automated backups, etc.
What you get is an infrastructure that allows you to route around failures and schedule hardware swap outs on your own timetable, which can be far more economical.
If you don't have that then it does involve costly emergency response at 2am to replace a bare metal server that went down. You either pay somebody you have retained locally to do it, or you are the one driving down to the datacenter at 2am to do the replacement yourself with who-the-heck-knows how long it will take with uptime monitoring solutions sending out emails like crazy to the rest of the admin staff, and heavens help you, some execs that demanded to be in the loop from now on due to an "incident".
Don't know about you..... but I would rather be able to relax at 10pm and have a few beers once awhile (to the point I can't drive) without worrying about bare metal servers going down all the time, or who is on call, etc.