On World of Warcraft's Network Issues
alphaneutrino writes to mention a C|Net article discussing some of the recent problems the World of Warcraft playerbase has experienced. From the article: "'Being a system administrator myself, I have some understanding of what goes on in a corporate data center,' said Evgeny Krevets, a sometimes-frustrated WoW player. 'I don't know Blizzard's system setup. What I do know is that if I kept performing 'urgent maintenance' and taking the service down without warning for eight-hour periods, I would be out of a job.' Blizzard blames some of the problems--such as the disconnection, for several hours on Friday, of players linked to several servers--on AT&T, its network provider. (AT&T did not respond to a request for comment.) "
Sunday: The day the server stood still
Monday: *gasp*, playable (until 11pm)
Tuesday: Weekly Maintenance Day. Nothing else EVER needs to be said about this day.
Wednesday: Playable (until 11pm), good chance maintenance aftermath.
Thursday: The 10 second instant-casts day for MC & BWL.
Yeah, it goes on. Our server reliably bites the dust around 11pm every night for 6 hours, not to mention the constant plague of login issues and 30-minute loading screens during peak hours. Funny how this is all on a low-medium population server.
...while you're not an idiot, I can understand where they could end up with one supplier for bandwidth.
:-)
1) You need a SLA with each ISP you pull backbone level feed from. You can use InterNAP and hook into the peering points in the US and a few other places, but it's got it's own issues- and if you just use them, you're still with only one ISP; if they fail, you're still up a creek without a paddle.
2) You'd need to frame the servers into one massive data center with a HUGE honking data-pipe from each ISP with BGP routing on the inbound routers from the ISPs to your DMZ to establish one IP address range for the front-facing servers
OR
Come up with some sort of nasty DNS trick to hopefully make the server front-ends transparent to the clients and spread them across multiple IP blocks (Which is what epicRealm did to make their CDN actually completely transparent to client and customer- and to be able to handle dynamic HTTP content...)- but be prepared, because in order for this to work right, you either need to trust the client's state, share state across server pools on different IP blocks, be stateless, or somesuch like the previous.
There's a bunch more, but those above two and the first item will hopefully show you why someone (a bean counter, most likely...) will make the decision to just simply hold the ISP or Tier-1 host (Which is the most likely case here- they're very probably colocated at an AT&T Tier-1 facility...) to the SLA they promised- because it's cheaper and waaay simpler if everything goes right and they're "not to blame" if things go wrong. If you went an alternate route and had a mishap that wasn't server related, then you'd be to blame and have nobody to point fingers at when it all broke (And you just KNOW it will at some point- it always does...
I am not merely a "consumer" or a "taxpayer". I am a Citizen of the State of Texas