AOL Creates Fully Automated Data Center

← Back to Stories (view on slashdot.org)

AOL Creates Fully Automated Data Center

Posted by Unknown on Tuesday October 11, 2011 @10:18AM from the tomorrow-system-architect-automates-himself dept.

miller60 writes with an except from a Data Center Knowledge article: "AOL has begun operations at a new data center that will be completely unmanned, with all monitoring and management being handled remotely. The new 'lights out' facility is part of a broader updating of AOL infrastructure that leverages virtualization and modular design to quickly deploy and manage server capacity. 'These changes have not been easy,' AOL's Mike Manos writes in a blog post about the new facility. 'It's always culturally tough being open to fundamentally changing business as usual.'" Mike Manos's weblog post provides a look into AOL's internal infrastructure. It's easy to forget that AOL had to tackle scaling to tens of thousands of servers over a decade before the term Cloud was even coined.

25 of 123 comments (clear)

Min score:

Reason:

Sort:

Wow .. how '2000'ish by johnlcallaway · 2011-10-11 10:27 · Score: 3, Informative

Wow ... we were doing this 10 years ago before virtual systems were commonplace, 'computers on a card' where just coming out. Data center was 90 miles away. All monitoring and managing was done remotely. The only time we ever went to physical data center was if a physical piece of hardware had to be swapped out. Multiple IP addresses were configured per server so any single server one one tier could act as a fail over for another one on the same tier. We used firewalls to automate failovers, hardware failures were too infrequent to spend money on other methods. We could rebuild Sun servers in 10 minutes from saved images. All software updates were scripted and automated. A separate maintenance network was maintained. Logins were not allowed except on the maintenance network, and all ports where shutdown except for ssh. A remote serial interface provided hard-console access to each machine if the networks to a system wasn't available.

Yawn ......

--
I rarely read replies, it's my opinion and if you thought about your opinion a little more, I'm OK with that.
1. Re:Wow .. how '2000'ish by johnlcallaway · 2011-10-11 10:32 · Score: 3, Informative
  
  Thanks for not pointing to the actual blog in the original article. So what they are really blogging is their ability to move an entire DATA CENTER without having to send people to do it. Other than .. you know .. install the hardware to start with.
  
  Never mind........
  
  --
  I rarely read replies, it's my opinion and if you thought about your opinion a little more, I'm OK with that.
2. Re:Wow .. how '2000'ish by timeOday · 2011-10-11 11:25 · Score: 2
  
  And other new tech fads are good reimplementations of ideas that didn't pan out in the past but are now feasible due to advances in technology. You really can't generalize without looking at specifics - "somebody tried that a long time ago and it wasn't worth it" doesn't necessarily prove anything.
3. Re:Wow .. how '2000'ish by rednip · 2011-10-11 12:04 · Score: 2
  
  "somebody tried that a long time ago and it wasn't worth it" doesn't necessarily prove anything.
  Unless there is some change in technology or technique, past failures are a good indicator of continued inability.
  
  --
  The force that blew the Big Bang continues to accelerate.
Who? by Jailbrekr · 2011-10-11 10:27 · Score: 3, Insightful

Seriously though, most telcomm operations operate like this. Their switching centers are all fully automated and unmanned, and usually in the basement of some non descript building. This is nothing new.

--
Feed the need: Digitaladdiction.net
1. Re:Who? by rickb928 · 2011-10-11 10:44 · Score: 2
  
  Um, I wouldn't be comfortable my telcomm's switching centers in basements. These are moct commonly the first room to flood when the water comes, and telcomm, switches are everywhere their users are.
  I see telcomm switches housed above ground, in plain, sometimes unmarked buildings. There's one a quarter mile from my house, and I drive by two others to go to work. If they have basements, I bet that's where they keep stuff that doesn't matter as much.
  And the huge switch that used to work in my old hometown, one of the last crossbar switches in the U.S. to convert to ESS. It was deafening in there, and the basement was empty. Six floors of relays going constantly. The mice ate the insulation like it was licorice. Putting any of that in the basement would be wrong, even if it was built on a hill.
  
  --
  deleting the extra space after periods so i can stay relevant, yeah.
What by Dunbal · 2011-10-11 10:28 · Score: 3, Funny

AOL still exists? Wow. Yeah ok I guess this is the result of years of beancounter thinking - the expensive part of running the service and the reason they were losing money was the IT staff, huh? Glad I closed my CompuServe account before giving these guys any money.

--
Seven puppies were harmed during the making of this post.
1. Re:What by Synerg1y · 2011-10-11 10:45 · Score: 2
  
  The contractors warranty their work :) Sometimes makes all the difference, the $15/h tech is just miserable usually.
2. Re:What by billcopc · 2011-10-11 11:03 · Score: 4, Informative
  
  How often does shit hit the fan in that sort of environment ?
  As a hybrid techie who does a lot of hardware work, I would much rather go in once a month, fix a batch of issues in one visit, collect my fat cheque and go back to the pub, than spend 40+ hours a week playing Bejeweled, waiting for stuff to break.
  I would expect AOL's strategy to greatly reduce costs, because that $15/hr rack monkey costs a lot more than $15/hr in the end. They have benefits, you have to "manage" them, they need human comforts like bathrooms, cleaning, seating, heating/air, lunch room. From an efficiency standpoint, the contractor route is more efficient in both money and time.
  
  --
  -Billco, Fnarg.com
In other news by mccrew · 2011-10-11 10:36 · Score: 2

In other news, the rest of AOL is expected to go "lights out" any time now.

--
Hey, Windows users, there is no such thing as "forward" slash, there is only slash and backslash.
Re:What is AOL again. ..? by SwedishChef · 2011-10-11 10:37 · Score: 3, Funny

I thought everyone knew... AOL is the Internet.

--
No one ever had to evacuate a city because the solar panels broke!
Re:So it will take ages for a fix by PTBarnum · 2011-10-11 10:41 · Score: 2

The article states "failed equipment is addressed in a scheduled way using outsourced or vendor partners". They don't care if an individual server is down, they just move the workload elsewhere, and wait for a repair. So there actually will be people in their data center doing repairs, they just aren't AOL employees and aren't based in the data center. I could see making a decision that a longer wait time for repairs is justified by labor savings, but it isn't really obvious where those savings come from. There is a suggestion in the article that they want the flexibility to increase or decrease the number of workers as needed, which is somewhat easier with contractors than regular employees, but with regular employees you can get a similar effect from part time or overtime work.
Re:What is AOL again. ..? by Moridineas · 2011-10-11 10:41 · Score: 2

They suck. They just suck differently now. They've switched from being an ISP to being a content company (and most of their content creators seems rather disgruntled). Mostly US-based, but most slashdotters should recognize names like TechCrunch or primarily HuffPo...the rest, not so much.
Works very well. by 140Mandak262Jamuna · 2011-10-11 10:57 · Score: 2

The new data center with 0 head count matches nicely the AOL user base with 0 head count!

--
sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
Two points. by rickb928 · 2011-10-11 10:58 · Score: 3, Insightful

One - If there is redundancy and virtualization, AOL can certainly keep services running while a tech goes in, maybe once a week, and swaps out the failed blades that have already beeen remotely disabled and their usual services relocated. this is not a problem. Our outfit here has a lights-out facility that sees a tech maybe every few weeks, and other than that a janitor keeps the dust bunnies at bay and makes sure the locks work daily. And yes, they've asked him to flip power switches and tell them what color the lights were. He's gotten used to this. that center doesn't have state-of-the-art stuff in it, either.
Two - Didn't AOL run on a mainframe (or more than one) in the 90s? It predated anything useful, even the Web I think. Netscape was being launched in 1998, Berners-Lee was making a NeXT browser in 1990, and AOL for Windows existed in 1991. Mosaic and Lynx were out in 1993. AOL sure didn't need any PC infrastructure, it predated even Trumpet Winsock, I think, and Linux. I don't think I could have surfed the Web in 1991 with a Windows machine, but I could use AOL.

--
deleting the extra space after periods so i can stay relevant, yeah.
1. Re:Two points. by Jay+L · 2011-10-11 16:59 · Score: 2
  
  AOL initially ran on a network of Stratus fault-tolerant minicomputers, each running two to eight 680x0 CPUs. Later we added unix boxen, some beefy SGIs and HPs for servers, and Suns for front-end telco interfacing IIRC. By the mid-90s we grew a Tandem fault-tolerant cluster for our critical databases; it did hot component failover, multimaster replication, all
  the stuff that's common today, but
  with SQL down in the drive controller for blazing speeds. We didn't really
  start moving to a PC-based architecture until the late '90s, when
  Linux provided cheap, reliable enough workhorses, and helped drive the
  big Iron prices down too
Re:Offtopic, but IT workers? by aix+tom · 2011-10-11 11:00 · Score: 3, Insightful

The software still needs to be written. The programs still need to be run somewhere.
Technically not much has changed. The "Cloud" is still made up of servers that have to be administered. The main effect is that the IT and network admins will have to keep up with technology, especially the new virtualization layers between the hardware and the running application. But keeping up to date has always been a part of working in IT.
Re:So it will take ages for a fix by Martin+Blank · 2011-10-11 11:04 · Score: 4, Interesting

One of the major backbone providers has a lights-out data center not far from my work. I know a guy who has a hosting business there, and he's shown me around to the limits of his access. There is no one on-site from the company or its contractors--not even a security guard. They have biometrics plus PINs for access; it's laced with low-light/IR cameras (it wouldn't surprise me to learn they have microphones); it has motion detectors in case the cameras miss something; and the redundancy is incredible. They maintain contracts with local electricians, plumbers, and a few technical companies should a blade burn out. They manage the entire thing from a few states over, and as of a couple of years ago almost all of their data centers had been converted to run this way. Savings were good, something like a million dollars per DC per year even as unanticipated downtime decreased.
I looked at it and saw the future of IT. I wasn't sure if I was more impressed or scared.

--
You can never go home again... but I guess you can shop there.
Re:Uh.... by silverglade00 · 2011-10-11 11:15 · Score: 3, Funny

Nobody will be there to see Skynet become self-aware. What... you thought the end of humanity wouldn't come from AOL?
AOL Needs a Data Center? by Trip6 · 2011-10-11 11:24 · Score: 2

Oh yeah, to house all the dial-up modems...

--
I hate being bipolar; it's awesome!
Re:So it will take ages for a fix by Zocalo · 2011-10-11 11:29 · Score: 3, Insightful

Who cares? I'm guessing you don't have much experience of server clusters but generally, long before you get to the kind of scale we are talking about here, you start treating servers in the same way you might treat HDDs in a RAID array. When one fails, other servers in the cluster pick up the slack until you can either repair the broken unit or you simply remote install the appropriate image onto a standby server and bring that up until an engineer physically goes to site. Handling of the data is somewhat critical though; should a server die you ideally need to be able to resume what it was working on seemlessly and without causing any data corruption; think transaction based DB queries and timeout/retry.

If you have enough spare servers and you can easily get by with engineers only needing to go on site once a month or so, assuming you get your MTBF calculations right that is. There's a good white paper by Google on how 200,000 hr MTBF hard drive failure rates equate to drive failures every few hours when you have a few 100k HDs.

--
UNIX? They're not even circumcised! Savages!
Re:no security or maintenance? by EdIII · 2011-10-11 12:34 · Score: 3, Insightful

The whole idea is not to need to get to stuff quicker at all.
If you are:
1) Completely virtualized.
2) Use power circuits that are monitored for load, on a battery back up, power conditioners, and diesel fuel generators for local utility backup.
3) Use management devices to control all your bare metal as if you are standing there, complete with USB connected storage per device that you can swap out the iso for.
4) Have redundancy in your virtualization setup that allows you to have high availability, live migration, automated backups, etc.
What you get is an infrastructure that allows you to route around failures and schedule hardware swap outs on your own timetable, which can be far more economical.
If you don't have that then it does involve costly emergency response at 2am to replace a bare metal server that went down. You either pay somebody you have retained locally to do it, or you are the one driving down to the datacenter at 2am to do the replacement yourself with who-the-heck-knows how long it will take with uptime monitoring solutions sending out emails like crazy to the rest of the admin staff, and heavens help you, some execs that demanded to be in the loop from now on due to an "incident".
Don't know about you..... but I would rather be able to relax at 10pm and have a few beers once awhile (to the point I can't drive) without worrying about bare metal servers going down all the time, or who is on call, etc.
Wow, is AOL still around? by QuietLagoon · 2011-10-11 13:47 · Score: 2

What are they doing nowadays that requires multiple servers?
Re:It was to be staffed... by haus · 2011-10-11 14:43 · Score: 2

You do realize that this story is about AOL, correct spelling would simply be out of plase.
Re:So it will take ages for a fix by tehcyder · 2011-10-11 23:30 · Score: 2

This isn't scary. This is things getting better.
It's scary if your job is manually maintaining servers.

--
To have a right to do a thing is not at all the same as to be right in doing it