Slashdot Mirror


Huge Traffic On Wikipedia's Non-Profit Budget

miller60 writes "'As a non-profit running one of the world's busiest web destinations, Wikipedia provides an unusual case study of a high-performance site. In an era when Google and Microsoft can spend $500 million on one of their global data center projects, Wikipedia's infrastructure runs on fewer than 300 servers housed in a single data center in Tampa, Fla.' Domas Mituzas of MySQL/Sun gave a presentation Monday at the Velocity conference that provided an inside look at the technology behind Wikipedia, which he calls an 'operations underdog.'"

18 of 240 comments (clear)

  1. Re:Impressive by sm62704 · · Score: 3, Interesting

    I was always impressed with how fast pages loaded, after seeing how small their operation is I'm even more impressed now!

    Go to any newspaper from the NYT to any one in a smaller city (say, Springfield's State Journal-Register) and the difference in load times is HUGE. Probably has to do with all the ads served from third party servers in the newspapers, what's the use of having a humungous server with giant pipes if your readers' pages have to wait for a flash ad served from a 486 powered by gerbils?

    If I link to the SJR form one of my journals it slows down! I mean, I can see if it's a front page slashdotting a little paper like that but come on, a user journal?

    And Wikipedia isn't all their servers serve; iinm the uncyclopedia shares servers. Impressive, indeed.

    --
    mcgrew's razor: Never attribute to stupidity that which can be explained by greedy self-interest
  2. More importantly by wolf12886 · · Score: 5, Interesting

    I don't care how few servers they have, whats more interesting to me is that they run an ultra-high traffic site, which they aren't having trouble paying for, and do it without adds.

  3. Off-topic, I know, but...what about /.'s hardware? by kiwimate · · Score: 5, Interesting

    I.e. the promised follow-up to this story about moving to the new Chicago datacenter? You know, the one where Mr. Taco promised a follow-up story "in a few days" about the "ridiculously overpowered new hardware".

    I was quite looking forward to that, but it never eventuated, unless I missed it. It's certainly not filed under Topics->Slashdot.

  4. Re:What is the role of Open Source by KokorHekkus · · Score: 4, Interesting

    The wiki software, MediaWiki, was written for Wikipedia and is licensed under the GPL ( http://www.mediawiki.org/wiki/How_does_MediaWiki_work%3F. According to Wikipedia they use MySQL as their database and run it all on Linux servers.

  5. Re:Note to self by Ron+Bennett · · Score: 4, Interesting

    Or do a hurricane dance, and let nature do its thing...

    Having all their servers in Tampa, FL (of all places given hurricanes, frequent lightning, flooding, etc there) doesn't seem too smart - I would have thought, given Wikipedia's popularity, their servers would be geographically spread out in multiple locations.

    Though to do that adds a level of complexity and costs that even many for-profit ventures, such as Slashdot, likely can't afford / justify; Slashdot's servers are in one place - Chicago ... to digress a bit, I notice this site's accessibility (ie. more page not found / timeouts lately) has been spotty since the servers move.

    Ron

  6. Re:Impressive by Bandman · · Score: 3, Interesting

    Yea, a single datacenter seems really risky, especially considering some of the shenanigans that have been going on

  7. Re:I was just thinking that by imstanny · · Score: 2, Interesting

    But why would they think it was a bad thing to expose? The whole "Look what we can do with so little" angle seems appealing; efficiency is something to boast about nowadays. On one hand, you're right, efficiency is admirable. But on the other hand, if Google has insane amounts of processing power, it would likely mean much higher barriers to entry for its competitors. The threat of Google's power in processing such data could deter others from even attempting to compete with Google. After all, when Google started it was only funded with a few hundred thousand dollars.
  8. Simplicity by wsanders · · Score: 5, Interesting

    Although much of the Mediawiki software is a hideous twitching blob of PHP Hell, the base functionality is fairly simple and run perpetually and scale massively as long as you don't mess with it.

    What spoils a lot of projects like this is the constant need for customization. Wikimedia essentially can't be customized (except for plugins obviously, which you install at your own peril) and that is a big reason why it scales so massively.

    As for Wikipedia itself, I suspect it is massively weighted in favor of reads. That simplifies circumstances a lot.

    --
    Give a man a fish and you have fed him for today. Teach a man to fish, and he'll say "WHERE'S MY FISH, YOU IDIOT?"
  9. Re:The power of low standards by WaltBusterkeys · · Score: 4, Interesting

    Exactly. A bank requires "six nines" of performance (i.e., right 99.9999% of the time) and probably wants even better than that. Six nines works out to about 30 seconds of downtime per year.

    It seems like Wikipedia is getting things right 99% of the time, or maybe even 99.9% of the time ("three nines"). That's a pretty low standard relative to how most companies do business.

  10. Re:What amazes me... by ceejayoz · · Score: 4, Interesting

    Slashdot is great at taking down sites on crappy shared hosting, but anything with a decently configured dedicated server will likely survive just fine.

    Wikipedia's probably getting hit with hundreds of times the traffic Slashdot is at all times.

  11. Re:Impressive by Bandman · · Score: 2, Interesting

    That would make a lot more sense.

    Given the sheer amount of people who access it, it seems like the perfect use for GSLB

  12. Wikipedia = much more traffic than slashdot by Anonymous Coward · · Score: 5, Interesting

    Slashdot does .. what? 40 mbit of traffic at peak? Wikipedia
    is roughly 100 times larger. (And WP has three datacenters, not one)

    Slashdot traffic hasn't created noticeable blips on Wikipedia's radar for years.

    OTOH, if Wikipedia linked slashdot on every page slashdot would go down, if do to nothing else but bandwidth exhaustion.

  13. Re:I was just thinking that by Chris+Burke · · Score: 4, Interesting

    I don't actually know anything about the total computing power Google employs, but I do know that they will purchase on the order of 1,000-10,000 processors merely to evaluate them prior to making a real purchase.

    --

    The enemies of Democracy are
  14. Re:Confused by the title by quanticle · · Score: 2, Interesting

    Good point. Perfect example: the Bill and Melinda Gates Foundation has a budget of billions of dollars, easily exceeding the budget of many private corporations.

    --
    We all know what to do, but we don't know how to get re-elected once we have done it
  15. Re:The power of low standards by Anonymous Coward · · Score: 2, Interesting

    Right, banks actually traditionally used such techniques as planned downtime to allow for maintenance. The "banker's hours" allowed for a large period of time, daily, where little-to-no 'data' was changing in the system and the system could be 'balanced'.

  16. Re:The power of low standards by az-saguaro · · Score: 3, Interesting

    Your reasoning may be a bit specious. If your databases get "several thousand writes per second", it sounds like this may be massive underuse of your bandwidth - i.e. your servers or databases may be able to handle hundreds of thousands or millions of writes per second. If a few seconds were lost or went down, then the incoming traffic might get cached or queued, waiting for services to come back on line. Once the connection is re-established, the write backlog might take only a few seconds or a few fractions of a second to catch up and be back to real time. Users might be unaware of the whole thing, or they would re-log and try again, and there would be no perceptible throttle or bottleneck to data logging. Any system that presses its bandwidth limits, any system that walks dangerously close to its top capacity, with no capacitances or reserves, is likely to be down quite a bit. A system such as yours, which hardly taxes its bandwidth at all (I am guessing) could certainly tolerate lost seconds. Admittedly, your system may have had problems like this in the past, and the system was upgraded to handle higher capacity. . . . Which is why Wikipedia no longer runs on just one machine. It does sound as though Wikipedia may have found a sweet spot, balancing load against reserve capacity or bandwidth, for robust up-time versus economic efficiency. I am sure that this is a topic that computer and network engineers have studied exhaustively - perhaps someone else knows?

  17. Re:I was just thinking that by kiwimate · · Score: 3, Interesting

    You know what I thought was interesting? This story (which was linked to from this /. story titled A Look At the Workings of Google's Data Centers contained the following snippets.

    On the one hand, Google uses more-or-less ordinary servers. Processors, hard drives, memory--you know the drill.

    and

    While Google uses ordinary hardware components for its servers...

    But this was immediately followed by:

    it doesn't use conventional packaging. Google required Intel to create custom circuit boards.

    For some reason I'd always believed they used pretty much standard components in everything.

  18. Re:I was just thinking that by Crazyswedishguy · · Score: 2, Interesting

    After all, when Google started it was only funded with a few hundred thousand dollars. Then again, when Google started, the Internet itself was considerably smaller, and the pages indexed by Google were much fewer. It was also slower and processing power wasn't as much of a limiting factor as your network connection.

    Although the idea that Google may in fact be serving all our searches with just one server seems kind of appealing, let's not kid ourselves, they have many large data centers. They use relatively cheap, commonplace equipment, but in every data center they have guys with shopping carts (really) swapping out defective servers as they walk down the aisles. (their infrastructure and file system is really interesting, actually)
    But don't forget that Google doesn't just provide search. They also provide storage-intensive services such as email (more than 6GB of storage space per account now I think) or video (youTube). One of the main reasons for having many data centers is to be able to push content (email, youTube videos, etc.) as close as possible to the end user before the user asks for it to minimize latency. User A in NY wants to watch a video, it goes much faster to send it from a data center in NY than to have to send it from CA. Serving video content or generally large amounts of data is a very capital intensive business that requires a lot of network and server infrastructure.
    --
    This space up for sale.