Slashdot Mirror


Yahoo! Orders Wikipedia Hardware

Edit This Page writes "Jimmy Wales announced today that Yahoo! has ordered 23 HP servers for the Wikimedia Foundation. The three database servers are model DL 385, and will come with dual Athlons, 8GB of RAM, and 6x 146GB 15K RPM drives each. They will also provide rackspace and bandwidth. The announcement comes four months after Google's announcement of support, and two months after Yahoo's own. Google has not yet made their intentions clear. You can read more about the specifications of what will soon be a 100+ server cluster at the Wikimedia Servers wiki article."

48 of 240 comments (clear)

  1. Also! by Raul654 · · Score: 5, Informative

    As I write this, our developers are switiching the entire site over to Mediawiki 1.5 (from 1.4), and most of the changes will make it run faster. So we're lowering the per-transaction cost of the software and increasing the server capacity -- this is a good thing.

    --


    To make laws that man cannot, and will not obey, serves to bring all law into contempt.
    --E.C. Stanton
    1. Re:Also! by slavemowgli · · Score: 2, Insightful

      Out of curiosity, why are you switching to 1.5 yet when the last release is still listed as "not recommended for use in a production environment"?

      --
      quidquid latine dictum sit altum videtur.
    2. Re:Also! by Jon+Chatow · · Score: 5, Informative
      Because the devs and the sysadmins are one and the same (generally), and they like playing fire with fire. :-)

      Seriously, "not recommended" is because it hasn't been properly tested yet in a large-scale environment; this is what is being done right now. If this version of MediaWiki works for Wikimedia, it should work for everyone else, too (barring the funny odd bits we don't use).

      --
      James F.
    3. Re:Also! by Jon+Chatow · · Score: 4, Interesting

      Unicode is assumed for 1.5, so all wikis will be converted as part of the transition process, including the English Wikipedia.

      --
      James F.
    4. Re:Also! by Jamesday · · Score: 2, Interesting

      Because the technical team at Wikipedia includes the developers and we know that there are sure to be problems as it is introduced to full service. Anything from outright bugs to database queries with unacceptable load properties. It'll probably be released for a general audience in four to eight weeks, once it's been very thoroughly tested at its biggest user site.

  2. Not Just Software... by __aaclcg7560 · · Score: 5, Funny

    Wikipedia Hardware?! I didn't know they make hardware. Does anyone have the Wikipedia link for this? ;)

    1. Re:Not Just Software... by Seindal · · Score: 5, Funny

      Everybody can add their own transistors.

      --
      René Seindal
    2. Re:Not Just Software... by jrockway · · Score: 2, Funny

      Not if you're Intel. Then you just ship a broken one.

      --
      My other car is first.
  3. required? by cryptoz · · Score: 2, Interesting

    Does wikipedia seriously need all that? I thought the data they were serving up was mostly just text and wasn't really a huge problem. As in, weren't their current servers enough? Or am I missing something?

    1. Re:required? by xMilkmanDanx · · Score: 5, Informative

      Just think of all the links that get posted in slashdot to wikipedia and it doesn't falter under the load. That and it's not just static pages, between building, rebuilding, keeping reversion history, indexing for searches and constant slashdotting...

    2. Re:required? by Jon+Chatow · · Score: 5, Informative

      Actually, no, bandwidth (I'll assume here that you meant "throughput" ;-)) problems are not significant, it's much more the actual server hardware. Wikis are very database- and CPU-heavy.

      --
      James F.
    3. Re:required? by teslatug · · Score: 5, Insightful

      Have you looked at the MediaWiki features? There's tons of dynamic features. What doesn't hit he cache, goes to the DB. Wikipedia is 67th in the Alexa ratings (Slashdot is 1,441th, of course not too many slashdotters use Alexa, but check some of the other sites, CNN is in the 20s, and Wikipedia gets more traffic in a day than /. gets in a month).

      Additionally, Wikipedia's lag is a dampening factor to its popularity. As more servers are added, it becomes more responsive, servers go to capacity again, and yet more hardware is needed.

    4. Re:required? by midom · · Score: 2, Informative

      Well, first of all, everything grows. Number of user increases all the time - doubles every two or three months. Number of pageviews increases as well. And last but not the least, there are more and more, bigger and bigger articles with more and more of history. Wikipedia is growing and it is running on really low-budget hardware. And... every time we make a site running faster, more users come and use available resources. Therefore, we can do two things. Optimize our software platform and increase our hardware capacity. There are questions why are boxes added in Seul. We're trying to bring content as close to people as possible. Light speed means slow in information age. We already have donated cluster in Amsterdam, which serves all Europe, we want to have same or better capabilities in Asia. And sure, we're improving constantly our main cluster in USA. Why we really need that much cpus? Wiki means a website with a content that could be edited last second. It cann't be desynced, as editing outdated content isn't that sane. Also, it doesn't simply serve HTML content. In wiki all documents are related, links tracked, document quality observed, etc. Therefore, for a task, that might look quite simple, we need quite lots of servers. We could serve those poor 2500 requests / second (~1500 pageviews per second) with two or three web servers, but.. hey, EDIT THIS PAGE.

    5. Re:required? by m50d · · Score: 2, Interesting

      Slashdot links barely touch the database. Any popular links are handled by the squid caches. It's the zillions of people all looking at different pages that stress the database.

      --
      I am trolling
    6. Re:required? by Canadian_Daemon · · Score: 4, Interesting

      According to Netcraft, /. is ranked 33, while Wikipedia is ranked 117.

      --
      This sig is definitive. Reality is frequently inaccurate.
    7. Re:required? by bobbozzo · · Score: 2, Informative

      FWIW, they have Squid caches in front of the web farm, so there are cached static copies of busy pages.

      --
      Nothing to see here; Move along.
    8. Re:required? by Pendersempai · · Score: 5, Interesting

      I saw a presentation by Jimbo Wales in which he compared the readership of Wikipedia, Slashdot, and NYTimes.com. Wikipedia recently passed NYTimes, and slashdot doesn't even compare. In fact, he noted with something of a smile that Wikipedia would probably bring Slashdot to its knees with a front-page link.

      Slashdot ain't got squat on Wikipedia.

  4. Some companies are just too cool for words by Council · · Score: 4, Funny

    So it seems now that Wikipedia has more street cred than either Yahoo OR Google, since they're both clammering to be seen as being in support.

    And with Google at aproximately 211 street cred units as of the last survey, Wikipedia is definitely doing well.

    --
    xkcd.com - a webcomic of mathematics, love, and language.
  5. wikihardware by mz001b · · Score: 4, Funny

    The trouble of course with wiki-hardware is that the system adminstration is left to the community.

    1. Re:wikihardware by hunterx11 · · Score: 4, Insightful

      That is not an example of a bad article at all. It is not a GNAA troll, but rather a descriptive an informative article on what the GNAA is. Wikipedia has many faults, but that fact that it covers topics that other encyclopedias don't is one of its strengths. If you are doing serious work, Wikipedia is not the place to go, but neither is Britannica.

      --
      English is easier said than done.
    2. Re:wikihardware by anthony_dipierro · · Score: 2, Insightful

      I mean, I've yet to see this in the Britannica yet, and that's why I use the Britannica more often than Wikipedia for serious work.


      So you use Britannica more often than Wikipedia for serious work because Wikipedia contains articles on things that Britannica doesn't? That doesn't make much sense to me. If your "serious work" doesn't have anything to do with the GNAA, then you're not going to type GNAA into Wikipedia's search field, and you're never going to see that page in the first place.

    3. Re:wikihardware by Vegeta99 · · Score: 4, Informative

      Also, in addition to HunterX11's comment, Wikipedia articles almost always have relevant links and sources listed. It's meant more as a starting point for research - it gives you a rather verbose summary of the information, and then points you in the right direction for more involved, serious research.

      If you use it correctly, you won't find a better encyclopedia anywhere.

    4. Re:wikihardware by alienw · · Score: 2, Insightful

      The whole point of a Wiki is that people can write about what they feel is important. This is what makes Wikipedia so good -- many articles are written by people who know what they are talking about first-hand, and not English majors trying to explain how something works. The quality of the writing isn't as good as a commercial encyclopedia, but the quality of the information is much higher.

      Considering that the GNAA has been trolling slashdot (which is one of the most popular sites on the 'net) for the last few years, they may as well have an article of their own. If someone took the time to write and update it, it's important enough to be in there. Plus, that page is hilarious!

  6. Re:This sounds like by Karma+Sucks · · Score: 4, Insightful

    All Google has done is hand-waving so far.

    On the other hand, Yahoo has been one of the earliest Wikipedia supporters according to TFA.

    --
    (Please browse at -1 to read this comment.)
  7. FYI: Those Are Opteron Servers by Anonymous Coward · · Score: 4, Informative

    Not Athlon

  8. Heh by aftk2 · · Score: 4, Insightful

    Only on Slashdot would Yahoo's donation be compared unfavorably to Google, when Yahoo has actually provided something, and Google has merely mentioned it.

    --
    concrete5: a cms made for marketing, but strong enough for geeks.
  9. South Korea? by s0rbix · · Score: 2, Interesting

    Does anyone know why they are being set up in South Korea?

    1. Re:South Korea? by Jon+Chatow · · Score: 4, Informative

      'Cos Yahoo! offered to host them at their facility there, and our overall global reach has a bit of a paucity in Asia.

      --
      James F.
    2. Re:South Korea? by commodoresloat · · Score: 4, Funny

      Because only old people will administer the servers.

  10. Re:This sounds like by LiquidCoooled · · Score: 4, Funny

    Theres nothing wrong with hand-waving.
    Obi-wan did ok by it.

    --
    liqbase :: faster than paper
  11. faulty facts in summary by TERdON · · Score: 3, Interesting
    As I noticed, the summary says dual athlon, and they're not really actual anymore (as far as I know the Opteron was introduced about two years ago). AMD did make Athlon MP processors earlier, which was why I reacted (why buy three year old tech?).

    The server hardware spec link said the "athlons" in fact are opterons. *sigh*

    --
    I have a really elegant proof for Fermat's last theorem. If this sig was only a bit longer...
  12. All over little ol' me! by stimpleton · · Score: 5, Funny

    This is sort of like those school yard spats over a girl.

    Wiki is the girl. Google and Yahoo are the two guys.

    My mother's advice surely applies to this situation(that I got many years back):

    "Stay away from that little trollop! Anyone that causes a fight is not worth it."

    Of course, I did hang round that girl. Pretty wee thing. It was all fruitles of course.

    Bitch! You whore Wiki!

    *begins to cry*

    --

    In post Patriot Act America, the library books scan you.
    1. Re:All over little ol' me! by zobier · · Score: 2, Funny
      Interesting, the analogy between trollop and wiki is fitting:

      Everyone puts their bit in.

      --
      Me lost me cookie at the disco.
  13. Re:This sounds like by Jugalator · · Score: 3, Insightful

    If Yahoo is a "me too" move, Google was a "look how good we are" move.
    Regardless, it's good for not only the administrators, but obviously for their large user base too.

    --
    Beware: In C++, your friends can see your privates!
  14. Yahoo/Google war by BonoLeBonobo · · Score: 3, Interesting

    Seems to be a war to be the best "opensource" helper. See Google wants to help wikipedia, Yahoo helps wikipedia, Google makes Google summer code ...

    What's next ;-) ?

    --
    Bonjour !
    1. Re:Yahoo/Google war by kihjin · · Score: 2, Funny
      In other news...
      • Microsoft announced today the Windows platform source code will be released onto the SourceForge under a OSI-compatible license...
      • Duke Nukem Forever date was moved forward, and not back. According to developers, the game is complete, they are "just trying to beat it first"...
      --
      This slashdot-related signature is a stub. You can help kihjin by expanding it.
  15. Re:Wikipedia's total bandwidth ? by BReflection · · Score: 2, Informative

    Up until recently when they moved to a new co-lo this data was out there, but it is unfortunately no longer available. I can say as a fact though that they are currently pushing out about 17 terabytes per month and growing strong. There's a bandwidth graph and instructions to read it on this page of my site.

    --
    python -c "x='python -c %sx=%s; print x%%(chr(34),repr(x),chr(34))%s'; print x%(chr(34),repr(x),chr(34))"
  16. Re:Bad idea by marcello_dl · · Score: 5, Insightful

    True, but think about it, what is the truth for non technical things?

    Before wiki and the 'net in general made content become alive, and coming from whatever source, all such discussions were lost. The winner of the argument, or more likely, the one with the arguments that were more pleasing for the ones in charge, would win and get published and later become part of what is taught in schools.

    With wikipedia the argument is part of the content and being critic of what you read is a good exercise for the mind.

    --
    ---- MISSING MISCELLANEOUS DATA SEGMENT --- [sigdash] trolololol
  17. Uh, hardly ... by dustmite · · Score: 4, Informative

    RTFA - it is the Wikipedia guys who are holding up Google's donation, not Google:

    "Wikimedia's planned facilities in Amsterdam (The Netherlands), Belgium, and Asia are not online yet, so it would be premature at this juncture to ask Google for something specific when we don't yet have good technical knowledge of what we will need in the coming months following the introduction of these new facilities. Google are eager to help us, and Wikimedia are eager to accept their help, but the Board want to be good stewards of donor money, and this requires them to move carefully"

  18. Re:Wikipedia's total bandwidth ? by Jamesday · · Score: 3, Interesting
    Averaging 60-70 megabits per second over a whole month. Peaks at 320 megabits per second in extreme cases. Typical daily peaks in the 120 megabit per second range. 6 months ago it was more than 200 million database queries per day and it's probably several times that today.

    I'm wondering about setting up a network of boxes running the Coral software. Those have built in fault tolerance so it wouldn't take lots of admin work and would allow accepting many small bandwidth offers, in countries with comparatively low traffic. Makes most content even closer to the end users and spreads the bandwidth load around. Nothing actually happening on this front yet, though.

    A very large number of places witih full database servers and page builders, like this Yahoo announcement, would have too much admin overhead - 3-6 of those places is about right.

    P2P is a security problem. People can always modify P2P programs to add nasty content and Wikipedia has already seen people trying to upload that and has filters in place to catch and block some things.

  19. Re:This sounds like by Jamesday · · Score: 4, Informative
    Yahoo was first, about a year before the Google thing Yahoo arranged some content linking. Then ON THE SAME DAY both Google and Yahoo agreed to provide hardware. The Google news leaked, making it appear as though Google was first when it was actually as close to simultaneous as these things can be. Each is being accepted and used in the order which works out most conveniently for Wikipedia.

    Both Yahoo and Google deserve approximately equal kudos for being helpful to the projects. Thanks!

  20. hardware compensating for poor software by njyoder · · Score: 3, Interesting

    This is a classic case of considering the hardware to be the problem rather than the software. The software has serious issues when it comes to performance and the developers are very slow to address it. Hell, Tim Starling, a lead developer, even stated that one of the design goals of the MediaWiki software was to spend as little time as possible developing it. I kid you not, that's paraphrasing something (with NO exaggeration) that was said in a presentation document which I can find if anyone doesn't believe me.

    I've heard some whining from some of the developers because they didn't have a ready made solution for certain things, meaning they would have to put actual *effort* into making their own. The idea of writing glue code (to C code) to make up for a feature lacking in existing php libraries was considered an abhorrent thing.

    Their best response to me pointing out flaws in their "development philosophy" was to them retort with the oh-so-clever "well why don't you write something better yourself?" Of course, that phrase is just a code word for "we know it sucks and we're just not willing to put all the extra effort into rewriting major portions of it." Really, it's sad when you have to define your software in terms of someone else (your opponent specifically) not writing something better.

    This isn't just unfounded complaints either. The developers have often complained that the existing implementation (and especially the choice to write the original code in PHP) needs to be rid of. They've said it has "everything and the kitchen sink" and that it degrades performance, but aren't trying that hard to get rid of it. They know this as a matter of fact through testing--Mediawiki has a massive overhead in setup time compared to other wiki software.

    Not just that, but the Wikipedia admins are all volunteers and aren't exactly the cream of the crop. They took them as volunteers since they were the best ones to devote that much time to it and unfortunately that means they're mediocre and they REALLY are not experienced for such a high traffic website.

    If they actually had a paid full time admin who had considerable background in sites like this, you'd suddenly see a massive drop in down time and other problems.

    1. Re:hardware compensating for poor software by GerardM · · Score: 2, Interesting

      Again, you do not know what you are talking about. Jimbo is not a payed employee of the Wikimedia Foundation.

      The difference with my POV and yours is that I put my money where my mouth is.

      There are many ways of looking at the quality of developers. I am sure that there will be few websites running as cheaply as the WMF does. Just compare the hardware costs for instance. Another way is looking at the number of developers and look at the amount of traffic is served. I am sorry but you provide no metrics to back up your claim why the quality is substandard.

      This is not to say that the quality of the code could not be improved. I am not convinced that money is the only answer. The big advantage that the WMF developers have is intrinsic motivation. Often missing with hired hackers.

      You characterise one of my arguments as a strawman. Well, you hide behind the back of someone else. That is cheap.

      About corporate sponsorship; we have some of it. We have payed for projects but then again, would you know. The argument that some people are against it is just that. When development is needed and someone is willing to pay for it, it can be done.

      Your suggestion that professional programmers do not work with trial and error is .. based on what ? Often software does not work as advertised in manuals.. Certainly when you are scaling outside known terroty you need new methods, clever hacks.

      When the amount of servers go up, you find that at the same time the usage of the servers goes up. The demand for information is such that throwing hardware at the problem gives an equal amount of new users. As to software, the last software versions have made this growth possible. The notion that it is only hardware that is seen as a solution is false if you consider the statistics.

      Then again, why am I arguing - you know better.. this is slashdot :)

      Thanks,
      GerardM

  21. Re:Wiki Quality by cheesybagel · · Score: 2, Informative

    Hello? Wikipedia keeps all versions of an article. Just refer to the specific version in that case.

  22. Re:Incorrect processor, but still AMD. by RzUpAnmsCwrds · · Score: 2, Informative

    No, you're thinking of the DL585, which is quad-Opteron (up to 8-cores). The DL385 is dual-processor (though you can install dual-core Opterons to get 4 total cores).

  23. Wikipedia Servers by silverz · · Score: 2, Informative

    Here is their servers list.

  24. This should draw some ire... by aminorex · · Score: 2, Informative

    Since I presumably have moderation to burn, I'll say frankly that I'm appalled. Wikipedia is enormously valuable as a resource in objective domains such as hard science and mathematics, but its articles in politically and culturally sensitive areas are abyssmal reflections of popular delusion and political correctness that do an enormous disservice to us all. The cockles of my heart not not warmed.

    --
    -I like my women like I like my tea: green-
  25. Someone mod this up. by ta+bu+shi+da+yu · · Score: 3, Informative

    That's a Wikipedia server admin that's speaking.

    --
    XML is like violence. If it doesn't solve the problem, use more.