On World of Warcraft's Network Issues
alphaneutrino writes to mention a C|Net article discussing some of the recent problems the World of Warcraft playerbase has experienced. From the article: "'Being a system administrator myself, I have some understanding of what goes on in a corporate data center,' said Evgeny Krevets, a sometimes-frustrated WoW player. 'I don't know Blizzard's system setup. What I do know is that if I kept performing 'urgent maintenance' and taking the service down without warning for eight-hour periods, I would be out of a job.' Blizzard blames some of the problems--such as the disconnection, for several hours on Friday, of players linked to several servers--on AT&T, its network provider. (AT&T did not respond to a request for comment.) "
Ive barely even seen any issues since patch 1.10. I think patch day the servers were down all day, but thats to be expected.
Server preformance varies from realm to realm. I hadn't really had any issues until the last week or two when my server decided to drop 40 minutes into our 45 minute baron run, and then again in the BG's later on.
As someone else mentioned, I think they are still a victim of their own success. Sure it's been over a year since launch, but they were expecting 250,000 subscribers and got 6,000,000.
The problem doesn't seem to be how much they spend but where they spend their money. According to the article AT&T seems to be their only network provider. Who thinks that makes sense? To have such a huge bandwidth hungry product and rely on one provider for it. I would never host a commercial web site on a host with a single provider, let alone a huge undertaking like WoW.
But, then again, I may also be an idiot... who knows?
Sounds like WoW has a house of cards network with single point of failure architecture problems.
And that AT&T is exploiting them, marketing a new "premium service/support" contract by letting them go down.
I can't wait until WoW has to pay AT&T (and its handful of competitors, if they get rid of the SPF) the extra "premium tier" routing fees, once the telcos market their "nonneutral" Internet. Because a world of angry Warcraft players jonesing for their fix will be a nice gift for telco suits just trying to make it home from work.
--
make install -not war
A large part of the problem is that Blizzard's communication with the player base sucks, to speak frankly. The login server for their forums seems to be one and the same as the login server for the game itself, so when that goes down the forums tend to shut down as well. There is a "Realm Status" page which purports to show the real-time status of the various servers, but which is frequently unreachable. There is a "Realm Status" forum which *might* contain some acknowledgement of a problem while the problem is still ongoing, but usually doesn't. When you start up the game client, Blizzard can stick up a 'News' window on your screen but, again, the appearance of any news often lags the problem, even severe problems, by a matter of hours. And, of course, Blizzard's chief form of communication with players is Community Managers on the forums, who themselves tend to be given dick in the way of information, are extremely controlled in what they can and cannot say, and who are (honestly, I'm not joking), tasked with yelling at users for stuff posting subject headers that contain excessive capitalization; what an obscene waste of resources.
Seriously, a little timely information goes a long way. Yes, I agree that the downtime they have is absurd; consider that *every Tuesday* the game goes offline for *six hours* of maintenance. That's *planned, scheduled* downtime, folks, so that *alone* means they aren't even attempting to have greater than 96.4% uptime, and I can't think of another commercial service for which you pay a monthly fee where that would be even remotely acceptable; if your cable or your phone just plain didn't work for 6 hours every Tuesday, heads would roll. Then things just get asinine when you factor in all the spontaneous, freewheeling, unplanned downtime as well.
But know what? I'd feel a lot better about it if, when something shits the bed, or goes tits-up, or whatever colorful metaphor you'd use to describe a server-killing technical problem, Blizzard would tell us, promptly, as they receive the information themselves:
1. We know there's a problem.
2. We know what the proglem is.
3. Here's what we're doing to fix it.
4. Here's when we expect it to be fixed.
5. Update as old information is obsolete.
They don't do this. A few hours after something happens, you might get some of the above information. Or you might not. Usually, it's the latter.
I don't play any online games but I thought the whole idea of them was that you subscribe to that service for it to be available just about 24x7 whenever you feel like jumping in. Sure, occasional outages are to be expected but if it gets to the stage where the game is frequently slow or unavailable, the common sense solution would be to cancel your subscription until Blizzard (or whomever) improves the service they deliver you. If enough people did this, they'd have to do something about it...
I'm sorry but I think far too many people have become "slaves" to marketing by truly believing that they simply cannot do without a lot of the products & services that they pay good money for - to the point where they "need" those items so much that they're afraid of complaining in case they're denied those things completely.
Gentoo Linux - another day, another USE flag.
If these problem are really related to AT&T, then why do we Germans experience exact the same problem? Over here T-Online is the bad guy. To solve the problem, Blizzard even suggested to alter you MTU-rate for your dsl to 1400. I don't know how many people ever heard of a thing called MTU ever. (the common people, not the nerds here ;-) )
Blizzard should ask themself why the whole IT ifrastructure are haveing problems with there product and if it is really the isp's fault.
Actually, that's how software maintenance happens in the real world.
Real code is complex, and generally written as a massive matrix of inter-related side-effects causing things to happen*. When it gets written, the entire matrix is designed, intended, documented, and understood. Two years later the guys working on the code have no clue about the matrix of side-effect driven code, no clue about the complex set of business factors driving the technical aspects of the code (and by business factors, in a MMORPG I mean things like class X has bad faction with everybody making it more difficult for him to start out, but in return for overcoming that challenge has more powerful magic later in life - stuff like that) and when they are making a change they go in, find the one line of code that looks like what needs to be fixed and just change it without knowing all the places that change will ripple back to, invisibly, via the side-effect matrix.
A technical phrase to understand here is 'globally scoped variables' - and another one is 'design intent' - and as the current set of hacks don't understand the ramifications or scope of either, this is what happens.
Footnotes
* I didn't say it was a good idea. I just said it happens.
Glonoinha the MebiByte Slayer
As someone else mentioned, I think they are still a victim of their own success. Sure it's been over a year since launch, but they were expecting 250,000 subscribers and got 6,000,000.
The controlling factor for their server performance should not be the total number of subscribers, but the number of subscribers per realm, and Blizzard has complete control over that number, because they can mark a realm as "full" and disallow logins/signups. IOW, as you know, those 6,000,000 people are not all playing in the same game at the same time.
It should be possible to make the realms completely independent, so that this just becomes a matter of horizontal scaling, and having hardware/systems monkeys roll out new realms via some standard operating procedure.
Unfortunately, based on the rumors I have heard, Blizzard has chosen to tie a bunch of stuff together. For instance, the common web forums use the characters from all the realms (the web forums know about your level 23 mage), they have a single set of auth servers, it's not clear that the item databases are not shared between realms, and so on. This is sort of sad, because it's not like Blizzard are the first people to roll out an MMORPG.
Now, some might argue that tying some of this stuff together makes for a better user experience. However, when this entanglement leads to downtimes, one could make the argument that it's not worth it.
Anyway, my point is not to bash on Blizzard; I'm sure they've made some difficult design decisions correctly, and some difficult ones incorrectly. My point is that "we have lots of users" is not a good excuse when you have a service that lets you divide those users into sub-populations, and that there are probably architectural improvements they could make to improve their scalability. The real question is whether they have competent and experienced systems engineers to help them make those improvements, and whether management is committed to supporting them.
Anyway, so much for pre-coffee ramblings....
I love that this is all the same as with the Everquest servers. People constantly said that they would not buy from Sony eyc again because of the problems, nerfs, lack of support etc. It seems as if these issues are inherent to MMORPGs.