AOL Creates Fully Automated Data Center
miller60 writes with an except from a Data Center Knowledge article: "AOL has begun operations at a new data center that will be completely unmanned, with all monitoring and management being handled remotely. The new 'lights out' facility is part of a broader updating of AOL infrastructure that leverages virtualization and modular design to quickly deploy and manage server capacity. 'These changes have not been easy,' AOL's Mike Manos writes in a blog post about the new facility. 'It's always culturally tough being open to fundamentally changing business as usual.'"
Mike Manos's weblog post provides a look into AOL's internal infrastructure. It's easy to forget that AOL had to tackle scaling to tens of thousands of servers over a decade before the term Cloud was even coined.
Wow ... we were doing this 10 years ago before virtual systems were commonplace, 'computers on a card' where just coming out. Data center was 90 miles away. All monitoring and managing was done remotely. The only time we ever went to physical data center was if a physical piece of hardware had to be swapped out. Multiple IP addresses were configured per server so any single server one one tier could act as a fail over for another one on the same tier. We used firewalls to automate failovers, hardware failures were too infrequent to spend money on other methods. We could rebuild Sun servers in 10 minutes from saved images. All software updates were scripted and automated. A separate maintenance network was maintained. Logins were not allowed except on the maintenance network, and all ports where shutdown except for ssh. A remote serial interface provided hard-console access to each machine if the networks to a system wasn't available.
......
Yawn
I rarely read replies, it's my opinion and if you thought about your opinion a little more, I'm OK with that.
Seriously though, most telcomm operations operate like this. Their switching centers are all fully automated and unmanned, and usually in the basement of some non descript building. This is nothing new.
Feed the need: Digitaladdiction.net
AOL still exists? Wow. Yeah ok I guess this is the result of years of beancounter thinking - the expensive part of running the service and the reason they were losing money was the IT staff, huh? Glad I closed my CompuServe account before giving these guys any money.
Seven puppies were harmed during the making of this post.
In other news, the rest of AOL is expected to go "lights out" any time now.
Hey, Windows users, there is no such thing as "forward" slash, there is only slash and backslash.
I thought everyone knew... AOL is the Internet.
No one ever had to evacuate a city because the solar panels broke!
The article states "failed equipment is addressed in a scheduled way using outsourced or vendor partners". They don't care if an individual server is down, they just move the workload elsewhere, and wait for a repair. So there actually will be people in their data center doing repairs, they just aren't AOL employees and aren't based in the data center. I could see making a decision that a longer wait time for repairs is justified by labor savings, but it isn't really obvious where those savings come from. There is a suggestion in the article that they want the flexibility to increase or decrease the number of workers as needed, which is somewhat easier with contractors than regular employees, but with regular employees you can get a similar effect from part time or overtime work.
They suck. They just suck differently now. They've switched from being an ISP to being a content company (and most of their content creators seems rather disgruntled). Mostly US-based, but most slashdotters should recognize names like TechCrunch or primarily HuffPo...the rest, not so much.
The new data center with 0 head count matches nicely the AOL user base with 0 head count!
sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
One - If there is redundancy and virtualization, AOL can certainly keep services running while a tech goes in, maybe once a week, and swaps out the failed blades that have already beeen remotely disabled and their usual services relocated. this is not a problem. Our outfit here has a lights-out facility that sees a tech maybe every few weeks, and other than that a janitor keeps the dust bunnies at bay and makes sure the locks work daily. And yes, they've asked him to flip power switches and tell them what color the lights were. He's gotten used to this. that center doesn't have state-of-the-art stuff in it, either.
Two - Didn't AOL run on a mainframe (or more than one) in the 90s? It predated anything useful, even the Web I think. Netscape was being launched in 1998, Berners-Lee was making a NeXT browser in 1990, and AOL for Windows existed in 1991. Mosaic and Lynx were out in 1993. AOL sure didn't need any PC infrastructure, it predated even Trumpet Winsock, I think, and Linux. I don't think I could have surfed the Web in 1991 with a Windows machine, but I could use AOL.
deleting the extra space after periods so i can stay relevant, yeah.
The software still needs to be written. The programs still need to be run somewhere.
Technically not much has changed. The "Cloud" is still made up of servers that have to be administered. The main effect is that the IT and network admins will have to keep up with technology, especially the new virtualization layers between the hardware and the running application. But keeping up to date has always been a part of working in IT.
One of the major backbone providers has a lights-out data center not far from my work. I know a guy who has a hosting business there, and he's shown me around to the limits of his access. There is no one on-site from the company or its contractors--not even a security guard. They have biometrics plus PINs for access; it's laced with low-light/IR cameras (it wouldn't surprise me to learn they have microphones); it has motion detectors in case the cameras miss something; and the redundancy is incredible. They maintain contracts with local electricians, plumbers, and a few technical companies should a blade burn out. They manage the entire thing from a few states over, and as of a couple of years ago almost all of their data centers had been converted to run this way. Savings were good, something like a million dollars per DC per year even as unanticipated downtime decreased.
I looked at it and saw the future of IT. I wasn't sure if I was more impressed or scared.
You can never go home again... but I guess you can shop there.
Nobody will be there to see Skynet become self-aware. What... you thought the end of humanity wouldn't come from AOL?
Oh yeah, to house all the dial-up modems...
I hate being bipolar; it's awesome!
Who cares? I'm guessing you don't have much experience of server clusters but generally, long before you get to the kind of scale we are talking about here, you start treating servers in the same way you might treat HDDs in a RAID array. When one fails, other servers in the cluster pick up the slack until you can either repair the broken unit or you simply remote install the appropriate image onto a standby server and bring that up until an engineer physically goes to site. Handling of the data is somewhat critical though; should a server die you ideally need to be able to resume what it was working on seemlessly and without causing any data corruption; think transaction based DB queries and timeout/retry.
If you have enough spare servers and you can easily get by with engineers only needing to go on site once a month or so, assuming you get your MTBF calculations right that is. There's a good white paper by Google on how 200,000 hr MTBF hard drive failure rates equate to drive failures every few hours when you have a few 100k HDs.
UNIX? They're not even circumcised! Savages!
The whole idea is not to need to get to stuff quicker at all.
If you are:
1) Completely virtualized.
2) Use power circuits that are monitored for load, on a battery back up, power conditioners, and diesel fuel generators for local utility backup.
3) Use management devices to control all your bare metal as if you are standing there, complete with USB connected storage per device that you can swap out the iso for.
4) Have redundancy in your virtualization setup that allows you to have high availability, live migration, automated backups, etc.
What you get is an infrastructure that allows you to route around failures and schedule hardware swap outs on your own timetable, which can be far more economical.
If you don't have that then it does involve costly emergency response at 2am to replace a bare metal server that went down. You either pay somebody you have retained locally to do it, or you are the one driving down to the datacenter at 2am to do the replacement yourself with who-the-heck-knows how long it will take with uptime monitoring solutions sending out emails like crazy to the rest of the admin staff, and heavens help you, some execs that demanded to be in the loop from now on due to an "incident".
Don't know about you..... but I would rather be able to relax at 10pm and have a few beers once awhile (to the point I can't drive) without worrying about bare metal servers going down all the time, or who is on call, etc.
What are they doing nowadays that requires multiple servers?
You do realize that this story is about AOL, correct spelling would simply be out of plase.
This isn't scary. This is things getting better.
It's scary if your job is manually maintaining servers.
To have a right to do a thing is not at all the same as to be right in doing it