Wikipedia On the Brink? Or Crying Wolf?

Re:I really doubt it. by ZachPruckowski · 2007-02-10 05:10 · Score: 4, Informative

Downloads of all the Wikimedia Projects. You need to do a lot of DB work (XML -> SQL conversion, importing, rebuilding tables, etc.)

The issue is simply that massive servers are not cheap. Wikimedia is already at 100+ servers, and they are barely getting by. They could spend half a million on servers and still have a wish-list. And bandwidth isn't cheap. They get a charity discount, and a bulk discount, but it's still gigabytes and gigabytes a day.

Re:I really doubt it. by Zorglub1234 · 2007-02-10 05:12 · Score: 3, Informative

Didn't the wikimedia foundation used to provide a way for anyone to download the entire 25GB+ database for wikipedia?

http://download.wikipedia.org/ is what you are looking for; you can get monthly database dumps for all the wikis, containing XML files with the articles (or other meta-data, depending on what you are looking for).

Zorglub

Re:Google will fund them if nec. by heroofhyr · 2007-02-10 05:16 · Score: 4, Informative

Why don't you save yourself some time and just get a Wikipedia search bar for your browser? I used to do the same thing, but got tired of going through a Google search just to wind up clicking on the Wikipedia entry link anyway. Might as well spare yourself the extra steps and have a direct Wikipedia search in the corner of your browser window.

For Firefox:
https://addons.mozilla.org/search-engines.php

For Opera:
http://widgets.opera.com/search/?search=wikipedia& x=0&y=0&scope=all

For Internet Explorer:
http://www.google.com/search?q=help+me+i'm+still+u sing+internet+explorer&btnG=Google+Search

--
brandelf: invalid ELF type 'KEEBLER'

Google once offered to host Wikipedia by vakuona · 2007-02-10 05:21 · Score: 2, Informative

Whatever happened to that?

Re:Google once offered to host Wikipedia by Anonymous Coward · 2007-02-10 05:30 · Score: 5, Informative

I think the offer wasn't quite "no strings". It is not that I am 100% there were actually any strings - but rather google wouldn't or couldn't guarantee a few things.

Yahoo offered servers as part of the asia cluster and said "have them - you can use them as you wish" and the wikimedia foundation said thanks - and they are happily in use. So the precedent of using such help as been set - I presume that google weren't offering something quite as simple.

The wikimedia foundation were being wined and dined by a few tech suitors a year or so ago - but I think the heat has went out of any relationships due to the very uncompromising stance (e.g. china situation) that wikimedia takes (compared to all the $$ merchants who happily censor their Chinese content as the PRC desires) - no content compromises, no independence compromises and no advertising compromises - that is not what the tech companies want to hear.

Re:I really doubt it. by Blimey85 · 2007-02-10 05:21 · Score: 2, Informative

I'm sure he meant this page: http://en.wikipedia.org/wiki/The_Star_Wars_Holiday _Special

--
How is it that one careless match can start a forest fire, but it takes a whole box to start a campfire?

Hardware, people, bandwidth. by Short+Circuit · 2007-02-10 05:34 · Score: 5, Informative

I found a copy of their 2005 Q4 budget. Multiply that by four, and you have a rough approximation of how much it costs to run Wikimedia.

It looks like hardware is their single largest expense, at $190,000. Personnel takes a distant second place at $33,000. Bandwidth (well, hosting) takes third, at $24,000.

Also, a note at the bottom:

So far this is little more than a minimal budget, meaning a budget designed to pretty much just keep the foundation going. What is not included are special projects (content and/or software). Please include ideas for that on the talk page. --Daniel Mayer 22:39, 18 September 2005 (UTC)

--
tasks(723) drafts(105) languages(484) examples(29106)

Re:Hardware, people, bandwidth. by Anonymous Coward · 2007-02-10 09:08 · Score: 1, Informative

Their most recent budget is here. It costs around $75,000 per month to run the site, $12,000 for bandwidth.
Re:Hardware, people, bandwidth. by timeOday · 2007-02-10 09:23 · Score: 2, Informative

Thanks for the link AC. According to that, bandwidth is 17% of the budget. Throw in hosting as well and you're up to 35%.

Re:If you're short on cash... by Rogerborg · 2007-02-10 05:40 · Score: 2, Informative

Old adage: you have to spend money in order to get people to give you the money that they made.

It's punchier in the original Klingon, I grant you.

--
If you were blocking sigs, you wouldn't have to read this.

Re:I really doubt it. by truthsearch · 2007-02-10 05:42 · Score: 4, Informative

Bandwidth is cheap as dirt.

So you have experience with very popular web sites, do you? When you need high performance consistent bandwidth it is not cheap. I worked on a popular site whose bill was in the tens of thousands of dollars a month. Wikipedia is extremely fast so you can bet they're paying top dollar.

--
Developers: We can use your help.

Re:Google will fund them if nec. by Falesh · 2007-02-10 05:51 · Score: 4, Informative

Alternativly for Opera you could go to Wikipedia, right click inside the search box then select "create search". Once you have done that if you want to search Wikipedia simply enter "w" then the search terms into the address bar.

Re:Wikipedia and Citizendium by Raindance · 2007-02-10 06:05 · Score: 2, Informative

I do already contribute *plenty* to citizendium, by contributing articles and edits and money to wikipedia to fund you guys mirroring their content.

You do not, because we do not mirror Wikipedia's content. We unforked weeks ago.

Re:I really doubt it. by Doc+Ruby · 2007-02-10 06:25 · Score: 2, Informative

The data is available as XML, but to clone the site you need the MediaWiki app.

--

--
make install -not war

Re:Wikipedia and Citizendium by CRCulver · 2007-02-10 06:41 · Score: 2, Informative

I do already contribute *plenty* to citizendium, by contributing articles and edits and money to wikipedia to fund you guys mirroring their content.

Citizendium unforked from Wikipedia some weeks ago. And no articles will pass the approval process on Citizendium unless they can stand up to the rigour and consistency that scholars are used to in their professional work, which means that most Approved articles will bear little resemblance to their Wikipedia counterparts, inconsistent with regards to tone, styling, and references even in the best of cases.

Re:I really doubt it. by Iron+Condor · 2007-02-10 08:25 · Score: 2, Informative

No offense, but if your bandwidth is costing you tens of thousands of dollars, you're doing something wrong.

No offense, but get back to us when you leave the minor leagues and work on real corporate web sites for the Fortune 50. You're smoking crack if you think they don't spend tens of thousands per month on bandwidth.

Re-read his comment: he never claimed that they don't spend tens of thousands of dollars on bandwidth. He said they're doing something wrong when they spend tens of thousands of dollars on bandwidth.

Where and how you procure bandwidth is a business decision, and business folks aren't exactly the brightest of folks when it comes to technology. Yes, I have worked for an internet company that went through insane amounts of data and yes, they paid dearly for bandwidth and yes, they could easily have gotten the same amount for 1/10th of the price. But the business manager knew someone who swore Rackspace was the bomb and thus Rackspace it was at Rackspace's prices. Never mind that there's a rash of very cheap data centers much closer to the backbone who'll give you unmetered BW for a factor five less than what we paid.

(Of course that "friend" ran a business that relied much more on HW uptime than data throughput and for him Rackspace might have been the right choice. For us, it wasn't.)

I agree with the statement that you're doing something wrong if you're paying tens of thousands of $$ for BW a month. That's just not what it costs.

--
We're all born with nothing.
If you die in debt, you're ahead.

old numbers by Stone+Rhino · 2007-02-10 08:25 · Score: 2, Informative

That budget is a year and a half old; wikipedia's traffic has increased more than tenfold over that period.

--

Remember, there were no nuclear weapons before women were allowed to vote.

Re:I really doubt it. by Iron+Condor · 2007-02-10 08:33 · Score: 2, Informative

Call Cogent up and ask how much it is for a 10GB/sec connection.

Whoa there -- either we're living in entirely different worlds or there's a real ambiguity in the term "bandwidth" here. Where I come from, BW was never measured in "per second" or any such thing. A number like "GB/s" would have been called "throughput". When we used the term "bandwidth" it meant something like the aggregate amount of data shipped in or out over the course of a month. In essence the integral over the number you're quoting.

I've never dealt with a company that put limits on the amount of data I can move around "per second" or some such -- that's home-broadband thinking. Or has the business changed so much that businesses are now running their own servers in their own buildings and are paying for the connection from there to the trunk? I don't know about Wikimedia, but most "popular websites" aren't in the business of running hardware - they're in the business of running a business.

Is this another of those fabled "paradigm shifts" of the last couple years?

--
We're all born with nothing.
If you die in debt, you're ahead.

Re:They just got $1 Million by James+Walsh · 2007-02-10 09:10 · Score: 1, Informative

They paid more than $100,000 a year in salaries to their growing staff? And built up a $500,000 cash reserve? http://en.wikipedia.org/wiki/Wikipedia:Wikipedia_S ignpost/2006-12-11/Financial_audit http://upload.wikimedia.org/wikipedia/foundation/2 /28/Wikimedia_2006_fs.pdf

Forgot the rômaji? by tepples · 2007-02-10 09:35 · Score: 2, Informative

So instead of asking you politely, they just forcibly ban you when they see you trying contribute? They did ask you politely. The signup page links to the article Wikipedia:Username, which gives the romanization policy adopted by the English Wikipedia.

I was contributing in English, not moonspeak. It was my username that was in Japanese (and nothing impolite, either). Was it properly romanized?

Re:Forgot the rômaji? by tepples · 2007-02-10 10:56 · Score: 2, Informative

Why should I, or anyone else, expect to be banned for being born with a Japanese name? You were not banned; you were blocked. There is a difference. Blocked is a technical mechanism used against usernames and IP addresses. Banned is a social mechanism used against people. The blocking notice should have suggested changing your username to a romanized version. From Wikipedia:Username:
If you notice someone whose username is inappropriate, please ask them on their talk page to change their username. If you feel that an administrator applied procedure incorrectly, do you remember the username of the administrator who blocked you?

Re:Almost All of Us by tepples · 2007-02-10 09:44 · Score: 2, Informative

Try reading the Wikipdia article on false dichotomies, which redirects to an article titled "false delimna." [...] Gee Whiz, Beaver, the "references" section of the Wikipedia article only points to other Wikipedia articles.

What you're seeing is an empty references section followed by a navigation box. Yes, this is a defect in the article. I just added a tag to the page to bring this specific defect to other editors' attention.

Re:I really doubt it. by vidarh · 2007-02-10 09:59 · Score: 2, Informative

For $400k/year I can lease 100 dual core dual CPU Woodcrest based servers with 2GB RAM and a couple of hundre GB storage each including 300 terabytes of monthly transfer. For $290K/year I'd get 100 dual core, single CPU Woodcrest based servers including about 250 terabytes of monthly transfer. That's a commercial rate without shopping around and without negotiating any sponsorships or anything.

A page on the Wikimedia foundations page indicates around 200 terabytes a month, but is marked as outdated - I have no idea how much it's grown since then. So yes, it's not cheap, but you have to wonder if they couldn't get a couple of hundred corporate sponsors to commit to $500 or so a month to pay for 1-3 servers each and get their logo as a sponsor on the pages served from that server. I know a lot of people don't want ads on Wikipedia, a single, discreet, static and easily blockable image wouldn't be very intrusive. 3TB/month of bandwidth + a server can be leased for a few hundred dollars.

Re:I really doubt it. by MostAwesomeDude · 2007-02-10 13:54 · Score: 2, Informative

Yo. Well, it's pretty much as you say. Wikipedia currently has about 200 servers, most of which are dedicated to a single task. There's a web cluster running Apache with PHP (with eAccelerator, I believe,) that runs the Mediawiki software and serves requests. (That is about 100 servers, if I remember right.) There's a database cluster which runs the Mysql databases; one cluster is English, a few other languages have dedicated boxes, (Chinese, Korean, Spanish, I think...), and another cluster for all other languages. There's also a Nagios box somewhere in there that monitors the whole shebang. Everything is situated behind a set of Squids, like you suggested. In fact, three of four vrequests to Wikipedia are hitting a Squid, not an Apache server. Also, some of the Apaches have memcached.

Wikipedia is indeed just text and images, but even with the cache, the entire thing has to run a disturbingly large number of edits through a database and then retrieve any one of over 1.6 million articles anytime it's requested. The scale on which the software runs hurts my head, and I would imagine the guys at Wikimedia's server place have similar headaches hourly.

--
~ C.

Re:Dump MediaWiki by Teancum · 2007-02-11 11:57 · Score: 2, Informative

I got that you didn't like MediaWiki. Yeah, it is slow, but it is slow because of what it does.

As far as the actual studies involved, I would have to at the moment refer you to the Wikimedia development team directly, although I've seen some published values that do go into some details.

For general statistics of Wikimedia projects, I would have to refer you to http://stats.wikimedia.org/ that goes into some depth about individual projects and what the general demands on them are, including statistical summaries of leading contributors, growth of content, and edit counts that would certainly be of general interest in terms of trying to compare to other Wiki environments.

I also would like to mention that Erik Zachte, the person who has written this statistical summary mentioned above, has also gone into depth regarding general usage data where he has been given direct access to the Apache server logs and has noted areas that were critical for Wikimedia projects. Brion Vibber is also actively involved with these reviews, and several of these statistical summaries were noted among the internal developers lists, with hints of these studies being mentioned from time to time on other foundation mailing lists.

There have also been formal requests for performing this sort of statistical analysis by several university research teams that have been eager to get such a statistical set, which also prompted the WMF to establish specific guidelines for obtaining this sort of raw data.

Is this specific enough? I don't know right off hand besides these direct studies, but I do know there are others that do exist as well. Wikipedia is a heavily studied topic in part because much of the data is open and available, which gives some interesting sociological interpretations as well if studied through the lens of a statistical review. And there is enough raw data to come to conclusions that may not fit the traditional orthodoxy, so you can also tweak some noses at the same time.

The reason I mention MediaWiki's feature set is that you are (I'm presuming here) claiming that one of the reasons why the Wikimedia Foundation is running out of money is due in part because they are foolishly spending money on server resources that could be better run had they only selected the proper Wiki software. I am offering a rebuttal that this is hardly the case, and that almost (because I can't claim absolute knowledge here) any other Wiki editing software package would die a horrible and nearly instant death if they had to deal with the same feature set and bandwidth issues that currently confront Wikipedia. Or that the other software packages are so lacking in the essential requirements needed to run Wikipedia that there is hardly room to even justify a valid comparison based off of only one single comparison.... content distribution bandwidth on the CPU.

Slashdot Mirror

Wikipedia On the Brink? Or Crying Wolf?

25 of 380 comments (clear)