Huge Traffic On Wikipedia's Non-Profit Budget

Impressive by locokamil · 2008-06-24 05:19 · Score: 4, Insightful

Given that their topic sites are generally in the top three for any search engine query, the volume of traffic they're dealing with (and the budget that they have!) is very impressive. I always thought that they had much beefier infrastructure than the article says.

Re:Impressive by VeNoM0619 · 2008-06-24 05:21 · Score: 4, Funny

Yes, and seeing how slashdot decided to try and slashdot them also helps...

--
Disclaimer: I am not god.
We may not be created equal
But we can be treated equal.
Re:Impressive by sm62704 · 2008-06-24 05:27 · Score: 3, Interesting

I was always impressed with how fast pages loaded, after seeing how small their operation is I'm even more impressed now!
Go to any newspaper from the NYT to any one in a smaller city (say, Springfield's State Journal-Register) and the difference in load times is HUGE. Probably has to do with all the ads served from third party servers in the newspapers, what's the use of having a humungous server with giant pipes if your readers' pages have to wait for a flash ad served from a 486 powered by gerbils?
If I link to the SJR form one of my journals it slows down! I mean, I can see if it's a front page slashdotting a little paper like that but come on, a user journal?
And Wikipedia isn't all their servers serve; iinm the uncyclopedia shares servers. Impressive, indeed.

--
mcgrew's razor: Never attribute to stupidity that which can be explained by greedy self-interest
Re:Impressive by Bandman · 2008-06-24 05:48 · Score: 3, Interesting

Yea, a single datacenter seems really risky, especially considering some of the shenanigans that have been going on

--
Check out my sysadmin blog!
Re:Impressive by Achromatic1978 · 2008-06-24 06:05 · Score: 4, Informative

Except there's not. There's data centers in Europe and Asia, too, including one at some Yahoo facilities - at least on this note, the article (or summary) is utterly wrong. Single datacenter? No.
Re:Impressive by Bandman · 2008-06-24 06:10 · Score: 2, Interesting

That would make a lot more sense.
Given the sheer amount of people who access it, it seems like the perfect use for GSLB

--
Check out my sysadmin blog!
Re:Impressive by David+Gerard · 2008-06-24 07:03 · Score: 5, Informative

No, actually - the Wikimedia servers serve all Wikimedia projects (all the Wikipedias, Wikimedia Commons, all the other projects), but Uncyclopedia is part of Wikia, which is a private company owned by Jimmy Wales to do wikis and isn't actually linked to the Wikimedia Foundation in any way.

--
http://rocknerd.co.uk
Re:Impressive by David+Gerard · 2008-06-24 07:05 · Score: 4, Informative

Single database, though. All the databases for all the projects are in Tampa - one master for English Wikipedia and two for all the other 700+ Wikimedia projects.
(They tried running the databases for Asian languages from the Yahoo!-sponsored datacentre in Seoul for a while, but it didn't actually work much faster than it did with everything in Tampa.)

--
http://rocknerd.co.uk
Re:Impressive by sm62704 · 2008-06-24 07:40 · Score: 1

Thanks, that was informative.

--
mcgrew's razor: Never attribute to stupidity that which can be explained by greedy self-interest
Re:Impressive by ElizabethGreene · 2008-06-24 09:27 · Score: 1

As one of the rare sorts that read TFA, I wish there was more detail. -ellie
Re:Impressive by kv9 · 2008-06-24 10:10 · Score: 4, Informative

I was always impressed with how fast pages loaded, after seeing how small their operation is I'm even more impressed now! you can skip TFA entirely and look here for detailed info on their servers, locations, pictures, software, pretty graphs and charts. and lots more, just keep clicking.

--
Stop Computers/Cars Analogies on S
Re:Impressive by mcrbids · 2008-06-24 19:30 · Score: 2, Insightful

As somebody who has been serving the Internet for a good length of time, I remember when busy web servers serving a 10 Mb stream were "ultra-high capacity" with a Pentium II 350 Mhz chip and 256 MB of RAM.
The reality is that today, if you pay any attention at all to performance and a reasonable architecture, modern commodity hardware has just utterly incredible delivery capacity. A cheap, 1U 4-core x86 with 8 GB of RAM and a couple of SCSI 10k drives can easily saturate a 1 Gb stream of static pages, or even dynamic pages if the core algo is reasonable. This server can cost about $2500 without too much trouble, and even with heavily database-driven applications, a couple of these can deliver an insane amount of traffic.
As an example, I use LAMP stack software to serve school districts. I went into one larger school with our software, and they had a half-dozen higher-end systems to serve a Filemaker Pro based application to their several hundred staff. Delays of 5 minutes or more were commonplace. Our computing cluster, consisting of four, 4-core servers with SCSI drives satisfied all their needs much faster than their existing solution, while simultaneously serving almost 100 other schools and school districts. Our software was cleaner and more efficient, and got a much bigger job done with greatly reduced resources.
LAPP (Linux/Apache/Postgres/PHP) can be damned efficient if you do it right.
So it really doesn't take much, anymore to serve a huge audience if you pay attention to systemic efficiency. That Wikipedia can do so much with just 300 systems actually seems heavy to me - I'm surprised that they need that many! I'd personally guessed something like 20-50 servers total, with dynamic pages heavily cached with static files and some kind of expiration algorithm, along with some spendy communications hardware.

--
I have no problem with your religion until you decide it's reason to deprive others of the truth.
Re:Impressive by bloodninja · 2008-06-25 01:25 · Score: 1

Yes, and seeing how slashdot decided to try and slashdot them also helps... Seriously, there are maybe less than five /.ings resulting in downed servers a month now. However, I regularly see websites 'suspended' from being linked on Digg / Reddit. Note that these seem to be websites that have exceeded a bandwidth limitation, not a server failure. I've only seen that maybe once or twice from /..

--
Lock the wife and the dog in the boot of the car.
Return one hour later.
Who's happy to see you?
Re:Impressive by Ihmhi · 2008-06-25 01:37 · Score: 1

just keep clicking.
"What is 'How to describe Wikipedia in three words', Alex. I'll take 'Ape Tit' for eight thousand."

--
Random Thoughts From A Diseased Mind (Not For Dummies)
Re:Impressive by Teancum · 2008-06-25 03:33 · Score: 1

I wouldn't say "isn't actually linked to the Wikimedia Foundation in any way".... it was "founded" by two members (at the time) of the board of trustees of the WMF. Wikia also provided for a time some technical support personnel, and also some financial gifts to the WMF. Bomis (another of Mr. Wales companies) was even more tied into the WMF at one point.
I'll agree that on a strictly legal basis there isn't any formal connection between Wikia and the WMF (other than Jimbo himself), there still are significant ties in between Wikia projects and WMF sister projects, including a great many shared administrators and cooperative content development.
And I'd also mention that there seemed to be a more or less defacto policy that any idea for creating a new sister project was instead strongly encouraged to become a Wikia project. This is something I've always looked at with a wondering if there might not be a financial conflict of interest by Mr. Wales and his association with the WMF for policies like this.
Re:Impressive by David+Gerard · 2008-06-25 03:39 · Score: 1

Wikimedia and Wikia remain good friends with a pile of people in common (I'm one of the few people who's an admin both on Wikipedia and Uncyclopedia, for instance), but nevertheless operate completely separately.

"This is something I've always looked at with a wondering if there might not be a financial conflict of interest by Mr. Wales and his association with the WMF for policies like this."

Good luck with that. Let me know how you make out with it.

--
http://rocknerd.co.uk
Re:Impressive by Teancum · 2008-06-25 19:44 · Score: 1

"This is something I've always looked at with a wondering if there might not be a financial conflict of interest by Mr. Wales and his association with the WMF for policies like this."
Good luck with that. Let me know how you make out with it.
I should point out that prior to the creation of Wikia (Wikicities as it was originally called), that the creation of Wikimedia sister projects was quite common, and several very interesting ideas were put forward that have been useful even to Wikipedia in a support role. Some of them have been closed down (aka the Klingon language version of Wikipedia and the 9/11/2001 project) and others are still struggling (Wikispecies).
The last sister project to gain approval from the WMF board was Wikiversity, and arguably even that project had its roots prior to the creation of the Wikimedia Foundation as well. My experience with the development of Wikiversity was that the WMF board simply couldn't ignore it due to such widespread support among other Wikimedia editors. In fact, Wikiversity is the only WMF sister project that has been developed under the current new project rules. The Wikimedia Incubator project is technically newer, but it was started in a manner that bypassed the new project development guidelines.
I can't get into the head of Mr. Wales, but it is amazing how such a policy change of dramatic proportions from experimentation of all kinds of crazy ideas on how to use wiki-related data to a very straight jacket approach on what might be acceptable for a sister project happened once Wikia was started as a company. Wikia is a for-profit corporation, and they are selling advertisement banner ads and other things that a great many of those involved with Wikipedia find quite objectionable. It is here that I find the financial conflict of interest.
Re:Impressive by something_wicked_thi · 2008-06-25 21:51 · Score: 1

That's because Slashdot doesn't usually link to a GeoCities page.
By the way, you look the same as everyone else from where I'm standing. :-p

I've always wondered... by mnslinky · 2008-06-24 05:22 · Score: 4, Insightful

It would be neat to have a deeper look at their budget to see how I can save money and boost performance at work. It's always nice having the newest/fastest systems out there, but it's rarely the reality.

Re:I've always wondered... by Anonymous Coward · 2008-06-24 06:06 · Score: 5, Funny

"It would be neat to have a deeper look at their budget to see how I can save money and boost performance at work."
Since they are using LAMP, obviously they could save money by following Microsoft's "Get The Facts" advice!
Re:I've always wondered... by midom · 2008-06-24 06:43 · Score: 5, Informative

I covered most of Wikipedia technology bits at my previous year MySQL Conference presentation: http://dammit.lt/uc/workbook2007.pdf (thats quite detailed report)
Re:I've always wondered... by rrohbeck · 2008-06-24 08:19 · Score: 1

It would be neat to have a deeper look at their budget to see how I can save money and boost performance at work. It's always nice having the newest/fastest systems out there, but it's rarely the reality.
RTFA? There are some pretty detailed docs linked in there.

--
thegodmovie.com - watch it

The power of low standards by Itninja · 2008-06-24 05:22 · Score: 4, Insightful

From TFA: "But losing a few seconds of changes doesn't destroy our business."

Our organizations' databases (also a non-profit) get several thousand writes per second. Losing 'a few seconds' would mean potentially hundreds of users' record changes were lost. If that happened here, it would be a huge deal. If it happened regularly, it would destroy the business.

--
I judt got a nre Kinesis keybiartf so please excusr ant egregiou typos.

Re:The power of low standards by robbkidd · 2008-06-24 05:37 · Score: 5, Insightful

Okay. So pay attention to the sentence before the one you quoted which read, "I'm not suggesting you should follow how we do it."
Re:The power of low standards by Anonymous Coward · 2008-06-24 05:47 · Score: 5, Insightful

Don't be too harsh -- the standards are dependent on the application. Your application, by the nature of the information and its purposes, requires a different standard of reliability than Wikipedia does. You're certainly entitled to be proud of yourself for maintaining that standard.
But don't let that turn into being derogatory about the Wikipedia operation. Wikipedia has identified the correct standard for their application, and by doing so they have successfully avoided the costs and hassle of over-engineering. To each his own...
Re:The power of low standards by WaltBusterkeys · 2008-06-24 06:01 · Score: 4, Interesting

Exactly. A bank requires "six nines" of performance (i.e., right 99.9999% of the time) and probably wants even better than that. Six nines works out to about 30 seconds of downtime per year.
It seems like Wikipedia is getting things right 99% of the time, or maybe even 99.9% of the time ("three nines"). That's a pretty low standard relative to how most companies do business.
Re:The power of low standards by ericspinder · 2008-06-24 06:12 · Score: 1

Losing 'a few seconds' would mean potentially hundreds of users' record changes were lost. If that happened here, it would be a huge deal. If you don't deal with financial data, it's likely that even your business would survive should such an event like that happens. Sure if it happens all the time users would flee, but I haven't seen such problems at Wikipedia. He wasn't talking about doing it regularly, just that when disaster does strike, no pointy haired guy appears to assign blame.

--
The grass is only greener, if you don't take care of your own lawn.
Re:The power of low standards by MinuteElectron · 2008-06-24 06:13 · Score: 2, Informative

Changes are never just lost, when an error does happen and the action cannot be completed then it is rejected and the user notified of this so they can try what they were doing again. You have vastly overstated the severity of such issues.

--
MinuteElectron
Re:The power of low standards by Nkwe · 2008-06-24 06:21 · Score: 5, Informative

A bank requires "six nines" of performance (i.e., right 99.9999% of the time) and probably wants even better than that.
Banks don't require "six nines"; banks require that no data (data being money), once committed, get lost. The "nines" rating refers to the percentage of time a system is online, working, and available to its users. It does not refer to the percentage of acceptable data loss. It is acceptable for bank systems to have downtime, scheduled maintenance, or "closing periods" -- all of these eat into a "nines" rating, none of which lead to data loss.
Re:The power of low standards by WaltBusterkeys · 2008-06-24 06:34 · Score: 1

The nines can refer to both.
I agree that banks can't withstand data loss, but they can withstand data errors. If there's a 30-second period per year when data doesn't properly move, and that requires manual cleanup, that's acceptable.
Re:The power of low standards by midom · 2008-06-24 06:41 · Score: 1

that happens to us once every few years maybe ;-) the fact is that servers don't go down too often. --Domas
Re:The power of low standards by Waffle+Iron · 2008-06-24 06:46 · Score: 2, Insightful

Indeed. Some of us are old enough to remember the days of "banker's hours" and before ATMs, when banks used to make their customers deal with less than "one two" (20%) availability.
Re:The power of low standards by astrotek · 2008-06-24 06:59 · Score: 2, Insightful

Thats amazing considering I get an error page on bank of america around 5% of the time if I move to quickly though the site.
Re:The power of low standards by Anonymous Coward · 2008-06-24 07:23 · Score: 2, Interesting

Right, banks actually traditionally used such techniques as planned downtime to allow for maintenance. The "banker's hours" allowed for a large period of time, daily, where little-to-no 'data' was changing in the system and the system could be 'balanced'.
Re:The power of low standards by Colonel+Korn · 2008-06-24 07:43 · Score: 1

Thats amazing considering I get an error page on bank of america around 5% of the time if I move to quickly though the site.
My BoA error message rate is definitely above 10%.

--
"I zero-index my hamsters" - Willtor (147206)
Re:The power of low standards by Anonymous Coward · 2008-06-24 07:57 · Score: 1, Insightful

> A bank requires "six nines" of performance (i.e., right 99.9999% of the time)
Wrong. When I worked for Wachovia, a successful week was only one hour of downtime. Management was very happy with a 99.4% uptime average. Our scheduled maintenance windows were one hour per day which is a 96% uptime. As long as no mistakes were made with data and the downtime didn't happen during the day, management really just didn't care.
I now work for American Express, and our customer service system and customer web sites are commonly down for more than an hour per week. A few weeks ago, we had more forty hours of downtime on a customer-facing web site, and no one lost their job. No one even got yelled at for it.
Slashdotters just have unrealistic expectations for uptime. In the real world, you have weekly maintenance schedules and a lot of downtime. Also, the cost of achieving five 9's is so great that it makes business sense to have reasonable requirements and expectations for availability.
Re:The power of low standards by AK+Marc · 2008-06-24 08:25 · Score: 5, Funny

Screw that, I want a bank with six twos of performance. 22.2222%. Of course, any number of nines is easy to achieve. Want six nines? 9.99999% is easy.

--
Learn to love Alaska
Re:The power of low standards by PMBjornerud · 2008-06-24 08:36 · Score: 3, Insightful

If there's a 30-second period per year when data doesn't properly move, and that requires manual cleanup, that's acceptable. And if there is a 1-hours downtime, EVER, you just blew through the scheduled downtime for the next 120 years.
"Six nines" is meaningless. Unrealistic.
It is a promise that you cannot be hit by a single accident, fuckup, pissed-off-employee or act of god.

--
I lost my sig.
Re:The power of low standards by Itninja · 2008-06-24 08:49 · Score: 1

Not so much. Sometimes the users data is written to the bi file, only to become corrupted before the that file can be added to the database. Layer 7 is blissfully unaware of this and presents no error. So the users continues to enter data only to find the following day that somebody's clock hours didn't get recorded. Can they enter then again? For 300 teachers? Only if they guess what the number were....

--
I judt got a nre Kinesis keybiartf so please excusr ant egregiou typos.
Re:The power of low standards by az-saguaro · 2008-06-24 09:09 · Score: 3, Interesting

Your reasoning may be a bit specious. If your databases get "several thousand writes per second", it sounds like this may be massive underuse of your bandwidth - i.e. your servers or databases may be able to handle hundreds of thousands or millions of writes per second. If a few seconds were lost or went down, then the incoming traffic might get cached or queued, waiting for services to come back on line. Once the connection is re-established, the write backlog might take only a few seconds or a few fractions of a second to catch up and be back to real time. Users might be unaware of the whole thing, or they would re-log and try again, and there would be no perceptible throttle or bottleneck to data logging. Any system that presses its bandwidth limits, any system that walks dangerously close to its top capacity, with no capacitances or reserves, is likely to be down quite a bit. A system such as yours, which hardly taxes its bandwidth at all (I am guessing) could certainly tolerate lost seconds. Admittedly, your system may have had problems like this in the past, and the system was upgraded to handle higher capacity. . . . Which is why Wikipedia no longer runs on just one machine. It does sound as though Wikipedia may have found a sweet spot, balancing load against reserve capacity or bandwidth, for robust up-time versus economic efficiency. I am sure that this is a topic that computer and network engineers have studied exhaustively - perhaps someone else knows?
Re:The power of low standards by Kingrames · 2008-06-24 09:15 · Score: 2, Funny

Screw that, it needs to be a prime number.
or at least irrational.

--
If you can read this, I forgot to post anonymously.
Re:The power of low standards by Anonymous Coward · 2008-06-24 09:40 · Score: 0

Look up clustering. You can achieve 100% service availability by clustering. There are also many other technologies that can help.
Re:The power of low standards by Qzukk · 2008-06-24 09:51 · Score: 2, Insightful

You can achieve 100% service availability by clustering
Is that where when I run "DROP TABLE reallyimportanttable;" it drops it on all the servers at once?

--
If I have been able to see further than others, it is because I bought a pair of binoculars.
Re:The power of low standards by BitZtream · 2008-06-24 09:56 · Score: 1

With a proper setup, redundancy, and given enough clusters, 0 down time is possible unless the entire world loses electricity.
Just because doing it is silly, doesn't mean its meaningless or impossible.
You don't have to promise not to be hit by a single accident, fuckup, pissed off employee or act of god, you just know that any one of those isnt' going to be able to corrupt the entire system or take it down. You use RAIDs to make disks more reliable from a logical perspective. If one dies, the others keep the logical drive alive.
Do the same with datacenters and their applications and you can do a lot towards achieving '6 nines'
Does anyone need to do it? No.
Does anyone actually do it? Probably not.
Does anyone claim to do it? Of course, but then again, my cable company also claims I get unlimited usage.

--
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
Re:The power of low standards by Tony+Hoyle · 2008-06-24 10:10 · Score: 1

If you did that at a bank you'd (a) have to be the system admin as I bet there's no more than one or two people in the entire company would have that kind of access, and (b) henceforth be unemployable, (c) lose everything as the lawyers took it off you, and (d) probably be facing criminal damage charges.
Re:The power of low standards by Tony+Hoyle · 2008-06-24 10:13 · Score: 1

They still do - try going to an ATM at around 7-8am.. they're all down for about 20 minutes a day (It's really annoying in fact if you want money before going into work and pick the wrong moment, as every bank does it at the same time).
Re:The power of low standards by Anonymous Coward · 2008-06-24 10:21 · Score: 0

Ergo Wikipedia is bound to be destroyed?
Re:The power of low standards by mpeskett · 2008-06-24 10:34 · Score: 2, Funny

I can promise you 6 i's of uptime.

But i^6 being -1, that's not a lot of uptime... if I ever provided you with anything it would be in excess of what I promised.
Re:The power of low standards by Anonymous Coward · 2008-06-24 10:56 · Score: 0

Pretty much every online banking website I've used is offline for scheduled maintenance from around 4am to 6am every Sunday -- and then there's the "unscheduled" maintenance. Sometimes, when I log into Bank of America I can access my checking/savings online but not my credit card account.
After Washington Mutual's site redesign which broke a few things, I filled out the "Give us feedback on our new site" box on the front page, which hit an internal server error when I tried to submit it.
Bank websites are probably the least reliable (in terms of being able to access my account info) out of most of the sites I visit.
Re:The power of low standards by Ghubi · 2008-06-24 11:24 · Score: 1

7-8am where?
Re:The power of low standards by pbhj · 2008-06-24 12:18 · Score: 1

And if there is a 1-hours downtime, EVER, you just blew through the scheduled downtime for the next 120 years. Nah, you include some extra clause that obscurely says not counting scheduled downtime. Then add another few clauses that enable scheduling of downtime to be retrospective ... et viola!
Re:The power of low standards by moosesocks · 2008-06-24 13:24 · Score: 1

That's funny, because most of the banks by me have about 12 hours of downtime every day (24 on sundays and holidays...and if something goes wrong on a normal day, it's simply declared a "local bank holiday")

--
-- If you try to fail and succeed, which have you done? - Uli's moose
Re:The power of low standards by emilper · 2008-06-24 19:19 · Score: 1

but surely you can find out which records were not added ? ... like by checking the last_modified timestamps and finding out for which teachers there are no records during a particular day?
Availability is important, but IMHO being able to identify corrupted data is at least as important as availability, if not more ...
Re:The power of low standards by netcrusher88 · 2008-06-24 19:22 · Score: 1

Except it isn't really a promise of any of those things. You'll find that almost all SLAs say that scheduled maintenance and things outside the control of the provider - like *their* provider - don't count towards downtime for SLA purposes.

--
There's an old saying that says pretty much whatever you want it to.
Re:The power of low standards by TheLink · 2008-06-24 20:08 · Score: 1

Yeah, but for Wikipedia it really is no big deal - the users will be remaking and reverting those changes anyway.
--
- Too many replies beneath your current threshold
Re:The power of low standards by Teancum · 2008-06-25 03:54 · Score: 1

If you want the low-down and the mathematical theory on the whole thing, you might want to read up on The Shannon Limit that establishes the basis for determining the capacity of network bandwidth. In this case, a "network" can mean a whole bunch of things, and the theory was originally developed to help determine how to best set up a telephone network.
Claude Shannon was an employee for AT&T, and responsible for a great many theories regarding network organization that still have applications today. It is a pity that the old Bell Labs doesn't exist (at least as it was function) to get things like this being developed/discovered today.
Wikipedia would be very interesting in terms of comparisons to the theoretical limits of this model. Of course, this is the absolute upper limit and in real life most networks have excess capacity well above even this limit.
Back in the 1970's and earlier, computer time (aka actually having access to the CPU in any form) was considered so valuable that most computer centers tried to run their CPUs at 100% capacity (or at least 95%+ capacity). With a 64k computer costing much more than $100,000, it isn't surprising that many businesses tried to get the maximum value out of those kinds of equipment.
Re:The power of low standards by Anonymous Coward · 2008-06-30 00:59 · Score: 0

actually, that precision of downtime would be remarkable!

I was just thinking that by imstanny · 2008-06-24 05:22 · Score: 2, Funny

Every time I Google something, Wikipedia comes near the top most of the time. Maybe that's why Google doesn't want to disclose its processing power, it may very will be a lot smaller than people assume.

Re:I was just thinking that by Anonymous Coward · 2008-06-24 05:38 · Score: 0

Yes, but anytime anyone googles anything, google has to do processing. Thus even though wikipedia is at the top for a small subset of searches (generally the ones for information as opposed to trends; commerce; specific blogs; &c.), google has a lot more work to do.
Let alone that google solves a logistic regression problem with (almost) every search, to figure out optimal adwords placements...
Re:I was just thinking that by Spatial · 2008-06-24 05:39 · Score: 1

But why would they think it was a bad thing to expose? The whole "Look what we can do with so little" angle seems appealing; efficiency is something to boast about nowadays.
Re:I was just thinking that by imstanny · 2008-06-24 05:51 · Score: 2, Interesting

But why would they think it was a bad thing to expose? The whole "Look what we can do with so little" angle seems appealing; efficiency is something to boast about nowadays. On one hand, you're right, efficiency is admirable. But on the other hand, if Google has insane amounts of processing power, it would likely mean much higher barriers to entry for its competitors. The threat of Google's power in processing such data could deter others from even attempting to compete with Google. After all, when Google started it was only funded with a few hundred thousand dollars.
Re:I was just thinking that by Bandman · 2008-06-24 06:13 · Score: 1

Ever pay attention to the render times, though?
Their infrastructure is scary-massive, from almost every report

--
Check out my sysadmin blog!
Re:I was just thinking that by Albanach · 2008-06-24 06:21 · Score: 1

Yes, but anytime anyone googles anything, google has to do processing.

I'd have thought they'd use a caching solution just like wikipedia. After all, just as Wikipedia has some very popular pages and some less so, Google has many popular searches and many less so. Wouldn't they cache these? After all if you're dealing with millions of searches for 'george carlin' you wouldn't want to go query your entire DB every time, would you?
Re:I was just thinking that by Chris+Burke · 2008-06-24 06:32 · Score: 4, Interesting

I don't actually know anything about the total computing power Google employs, but I do know that they will purchase on the order of 1,000-10,000 processors merely to evaluate them prior to making a real purchase.

--

The enemies of Democracy are
Re:I was just thinking that by dubl-u · 2008-06-24 07:08 · Score: 4, Insightful

But why would they think it was a bad thing to expose? The whole "Look what we can do with so little" angle seems appealing; efficiency is something to boast about nowadays. Turn it around. What does Google gain from exposing data about their internal performance?
Maybe they do well because they are amazingly CPU-efficient on a per-query basis. Maybe it's the opposite; they may be masters at lavishing CPU on every query, but know how to do that very cheaply. Most likely, it's a clever mix of the two.
Regardless, Google's engineering-fu and operations-fu are mighty, and a major competitive advantage. Releasing detailed data doesn't boost their reputation, as everybody already knows they are great. But it does give potential competitors an idea of what works well, making it easier for them to catch up with Google. As a rule, expect that any details you see from inside Google are old, boring, or vague. As Intel's Andy Grove said, "Only the paranoid survive."
Re:I was just thinking that by kiwimate · 2008-06-24 09:14 · Score: 3, Interesting

You know what I thought was interesting? This story (which was linked to from this /. story titled A Look At the Workings of Google's Data Centers contained the following snippets.
On the one hand, Google uses more-or-less ordinary servers. Processors, hard drives, memory--you know the drill.
and
While Google uses ordinary hardware components for its servers...
But this was immediately followed by:
it doesn't use conventional packaging. Google required Intel to create custom circuit boards.
For some reason I'd always believed they used pretty much standard components in everything.
Re:I was just thinking that by Crazyswedishguy · 2008-06-24 11:20 · Score: 2, Interesting

After all, when Google started it was only funded with a few hundred thousand dollars. Then again, when Google started, the Internet itself was considerably smaller, and the pages indexed by Google were much fewer. It was also slower and processing power wasn't as much of a limiting factor as your network connection.

Although the idea that Google may in fact be serving all our searches with just one server seems kind of appealing, let's not kid ourselves, they have many large data centers. They use relatively cheap, commonplace equipment, but in every data center they have guys with shopping carts (really) swapping out defective servers as they walk down the aisles. (their infrastructure and file system is really interesting, actually)
But don't forget that Google doesn't just provide search. They also provide storage-intensive services such as email (more than 6GB of storage space per account now I think) or video (youTube). One of the main reasons for having many data centers is to be able to push content (email, youTube videos, etc.) as close as possible to the end user before the user asks for it to minimize latency. User A in NY wants to watch a video, it goes much faster to send it from a data center in NY than to have to send it from CA. Serving video content or generally large amounts of data is a very capital intensive business that requires a lot of network and server infrastructure.

--
This space up for sale.
Re:I was just thinking that by TheRaven64 · 2008-06-24 22:06 · Score: 1

There have been a few /. articles on this subject. One thing they do (did / were planning on doing) was use a custom power distribution system, which (as I recall) has a per-rack AC to DC converter and then uses DC inside each rack. They also have things like USB chips removed from the motherboards - they may not use much power individually, but when you've got a few tens of thousands of them then this adds up (and they pay for power twice - once to get it in to the data center and once to get the heat out). It also adds a lot to the cost - maybe only $0.50 per chip, but that's a lot less than the cost of a custom board design when you're talking the quantities Google buys. It's also one more thing that can go wrong, which is important when you consider their failure rate (they get through machines so quickly it's not worth their while trying to fix them anymore - just pull the broken one and drop in a new one. If the only thing wrong with it is a USB chip malfunctioning and causing voltage fluctuations on the motherboard then that's a problem.

--
I am TheRaven on Soylent News

quick everybody by daveatneowindotnet · 2008-06-24 05:23 · Score: 1, Redundant

read up on the Roman Republic now before Wikipedia gets Slashdotted

Easy to Increase the budget or add servers by Subm · 2008-06-24 05:23 · Score: 5, Funny

How hard can it be to increase the budget or add more servers?

Just go to the Wikipedia page with those numbers and change them. You don't even need to have an account.

Re:Easy to Increase the budget or add servers by xkhaozx · 2008-06-24 05:44 · Score: 1

Yeah, but those stupid admins keep reversing my changes.
Re:Easy to Increase the budget or add servers by owlnation · 2008-06-24 06:03 · Score: 0

Yeah, but those stupid admins keep reversing my changes.
What did you expect? Truth?
Re:Easy to Increase the budget or add servers by elrous0 · 2008-06-24 06:31 · Score: 5, Funny

In their defense, if you're going to run your entire site off a single server farm,a coastal city in Florida is the logical place to put it.

--
SJW: Someone who has run out of real oppression, and has to fake it.
Re:Easy to Increase the budget or add servers by khellendros1984 · 2008-06-24 07:04 · Score: 1

I'm fine with truthiness, myself.

--
It is pitch black. You are likely to be eaten by a grue.
Re:Easy to Increase the budget or add servers by DRobson · 2008-06-24 22:43 · Score: 1

Especially when the last five complete database dumps are corrupt

Some thoughts by morgan_greywolf · 2008-06-24 05:24 · Score: 1

"The traditional approach to availability isn't exactly our way," said Mituzas, who spoke about Wikipedia's infrastructure Monday at the O'Reilly Velocity conference.

More and more companies should look into approaches like this. Seriously. In tight economic times, a more ad-hoc approach saves money. People snubbed Google's approach to IT, and now it's becoming the standard in high availability for big dollar projects. But what about the small dollar approach? As economies slide into recession, you need to focus on a handful highly-talented IT people rather than an army of droids.

--
My blog

Re:Some thoughts by Itninja · 2008-06-24 05:33 · Score: 1

you need to focus on a handful highly-talented IT people rather than an army of droids
As long as these IT people are willing to work well below the industry pay-scale (often for free), then yeah, that would work great. Notice that most of the Wiki IT staff also have to have 'day jobs' to feed/clothe/house themselves.

--
I judt got a nre Kinesis keybiartf so please excusr ant egregiou typos.
Re:Some thoughts by TheLazySci-FiAuthor · 2008-06-24 05:35 · Score: 4, Insightful

"... you need to focus on a handful highly-talented IT people rather than an army of droids."
This is so true; I've always said, "you get what you pay for."
Do you want to pay for software, or do you want to pay for people?
Only one can create the other.

--
Read my Very Short "Stories"
Re:Some thoughts by bsDaemon · 2008-06-24 05:36 · Score: 2, Insightful

Which is somehow different from any other open source project how?
Re:Some thoughts by morgan_greywolf · 2008-06-24 05:44 · Score: 5, Funny

Do you want to pay for software, or do you want to pay for people?
Only one can create the other.
Oh, gods, let's hope so!

--
My blog
Re:Some thoughts by Itninja · 2008-06-24 06:01 · Score: 1

It's not. But the parent was implying that corporation should follow the same model. I was just pointing out that for-profit companies need to pay their people a bit more than non-profit love-in projects like Wikipedia.

--
I judt got a nre Kinesis keybiartf so please excusr ant egregiou typos.
Re:Some thoughts by madfancier · 2008-06-24 06:56 · Score: 1

Do you want to pay for software, or do you want to pay for people?

Only one can create the other.

Not in Soviet Russia.
Re:Some thoughts by Smauler · 2008-06-24 09:53 · Score: 1

Already modded +5 funny, otherwise I'd have added. This is about the funniest one line post I've seen on /. - thanks for the laugh :).
Re:Some thoughts by dave420 · 2008-06-25 02:06 · Score: 1

Unfortunately for Master Shake and Carl, it can make "dogs".
Re:Some thoughts by Joey+Vegetables · 2008-06-25 02:21 · Score: 1

DNA can be viewed as software or as data. This has chilling implications. It is only a matter of time, and not much time at that, before sequenced DNA data can be used to create people (or plants, animals, microbes, etc.). It's already been done for smaller/simpler organisms.

--

Nonaggression works!

Interesting but... by wolf12886 · 2008-06-24 05:27 · Score: 1

Interesting to know, but I wish the article was more substantial than a list of tangential statistics. Also, although Wikipedia receives a hell of alot of traffic, I bet its at least an order of magnitude smaller than googles.

If someone knows where we can find a good comparison between Wikipedia and others, as far as cost to traffic ratio, please speak up.

Re:Interesting but... by bobbozzo · 2008-06-24 11:03 · Score: 1

Yahoo gets over a billion page views per day.
Note that's different than 'http hits' or 'http requests' as those latter 2 include images, JS, CSS, ...

--
Nothing to see here; Move along.

Maybe... by nakajoe · 2008-06-24 05:28 · Score: 3, Funny

Datacenterknowledge.com might want to take lessons from Wikipedia as well. Slashdotted...

Note to self by Anita+Coney · 2008-06-24 05:28 · Score: 5, Funny

If you ever find yourself in a flamewar on Wikipedia you cannot win, bomb Tampa, Florida out of existence.

--
If someone says he and his monkey have nothing to hide, they almost certainly do.

Re:Note to self by canajin56 · 2008-06-24 05:43 · Score: 5, Funny

That's your solution to everything.

--
ASCII stupid question, get a stupid ANSI
Re:Note to self by Ron+Bennett · 2008-06-24 05:48 · Score: 4, Interesting

Or do a hurricane dance, and let nature do its thing...
Having all their servers in Tampa, FL (of all places given hurricanes, frequent lightning, flooding, etc there) doesn't seem too smart - I would have thought, given Wikipedia's popularity, their servers would be geographically spread out in multiple locations.
Though to do that adds a level of complexity and costs that even many for-profit ventures, such as Slashdot, likely can't afford / justify; Slashdot's servers are in one place - Chicago ... to digress a bit, I notice this site's accessibility (ie. more page not found / timeouts lately) has been spotty since the servers move.
Ron
Re:Note to self by TubeSteak · 2008-06-24 06:16 · Score: 2, Funny

That's your solution to everything. I did ask if you wouldn't prefer a nice game of Chess.
-WOPR

--
[Fuck Beta]
o0t!
Re:Note to self by OverlordQ · 2008-06-24 06:22 · Score: 4, Informative

They're not all in Tampa, they have a bunch in Netherlands and a few more in South Korea.

--
Your hair look like poop, Bob! - Wanker.
Re:Note to self by xpuppykickerx · 2008-06-24 06:31 · Score: 1

Please don't bomb Tampa. I will be homeless and very mad at you. Not that I will be able to post on Slashdot to express my anger.
Re:Note to self by LWATCDR · 2008-06-24 06:51 · Score: 1

Tampa hasn't been hit by many Hurricanes. They don't have issues with flooding that I know about and lightning is lightning. It can happen anywhere just do your best to protect your systems from it.
If you are a few miles inland in Florida Hurricanes are not that big of an issue. If you have a good backup generator then it isn't that big of a problem.
Oh did I mention I was born, live, and work in Florida. My office was hit by Frances, Jean, and Wilma. Total damage to the office... Nothing. Total Damage to my home? Three shingles.
Florida doesn't tend to suffer from wide spread flooding like places in the midwest and really strong hurricanes like Andrew are actually very rare.
Most hurricanes in Florida would be a none event if our power company kept the power up. We call Florida Power and Light Florida Flicker and Flash.
For a data center a backup power system is really all you need.

--
See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
Re:Note to self by Anonymous Coward · 2008-06-24 07:05 · Score: 0

Tampa isn't quite as risky as some other parts of florida, but it is something to think about. What's much more likely to take out a datacenter (imho) are random problems attributed to the maintenance of the datacenter. No matter what state you live in, if you have a huge high-traffic site you need to have a separate location with the bare minimum set up so you can get your site back on the web, even in a slightly-crippled form. Backups are important but next to that is having the spare hardware set up and ready to turn on when you need it.
Re:Note to self by skeeto · 2008-06-24 07:10 · Score: 1

Tampa is pretty safe from all that. I have grandparents that live in St. Petersburg (right next to Tampa) and they have never had any damage or been in danger from the weather. If Tampa had major flooding, then pretty much the whole state of Florida will be submerged too. At that point Wikipedia is low on the list of things to worry about.
Re:Note to self by colfer · 2008-06-24 07:19 · Score: 1

FutureQuest is a highly rated web host with its data center in Orlando, FL. It has never gone down, even in hurricanes. Very occasionally the network connects or upstreams fritz, but not due to storms (usually it's BGP, etc.).
If you recall there was some heroic blogging out of New Orleans after Katrina. Some guys at an ISP in a tall building downtown kept themselves wired, and described hard core telecom types patrolling the streets. Surreal.
Re:Note to self by BBandCMKRNL · 2008-06-24 07:32 · Score: 1

They don't have issues with flooding that I know about... Tell that to one of my former employers who discovered their only production facility was in a Stage 1 evacuation zone when they were given 24 hours notice to evacuate. They produced life-critical items. Oops. The last I heard they set up a second production facility in a more secure part of the country.

--
Without the 2nd Amendment, the others are just suggestions.
Re:Note to self by Ma8thew · 2008-06-24 08:20 · Score: 1

So their other servers are in a country which is basically at sea level, and one which is under threat of invasion by North Korea, and experiences frequent earthquakes?
Re:Note to self by LWATCDR · 2008-06-24 08:25 · Score: 1

Why did he build it in a Stage 1 evacuation zone?
That is an avoidable problem.
But before you get all bent.
How long had it been since the last evacuation?
How much damage was done to the facility?
Did you have backup power? If so how long was the facility unavailable.
But like I said that is a totally avoidable risk. Who puts a critical facility that close to the water in ANY coastal area? Hurricanes can hit anywhere on the east cost of the US from South Texas to Maine.

--
See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
Re:Note to self by skis · 2008-06-24 08:56 · Score: 1

George? Is that you?
Re:Note to self by Smauler · 2008-06-24 09:59 · Score: 1

This is completely OT, but why on earth do you think South Korea is under threat of invasion from North Korea? It could never ever happen. Firstly, South Korea has far more advanced weaponry than North Korea, and secondly - Do you really think the UN would do nothing against an invading country which the one superpower of the world already hates?
Re:Note to self by BitZtream · 2008-06-24 10:07 · Score: 1

Tampa has been hit be a few large hurricanes. Not so much recently, but its due for a good one the way I see it. Panama City beach has been devistated by at least 3 hurricanes in my life time, Cedar Key has been decimated, Ft Myers as well. Tampa has been lucky recently, but its not any more immune than any other Gulf coast region.
Flooding due to rain in Florida is a non-issue, when the entire state is made of sand, water tends to fall through the ground faster than from the sky.
Storm surge on the other hand is a threat.
The reason you don't have problems with hurricanes in Florida is because they get hit every year with something. Florida has building codes that require the buildings to be able to take a little beating, so unless you're on the beach getting pounded by the surf directly, the building can generally deal with it in pretty much every instance except the killer that comes through and pushs the storm surge 2 miles inland.
Florida is not built like New Orleans, they have their buildings above sea level, even if only be a few feet in most places.

--
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
Re:Note to self by Iron+Condor · 2008-06-24 10:59 · Score: 1

Just because there are problems that cannot be solved by bombing Florida doesn't mean that it isn't a good idea to do so anyways.

--
We're all born with nothing.
If you die in debt, you're ahead.
Re:Note to self by LWATCDR · 2008-06-24 12:38 · Score: 1

Exactly don't build your data center on the beach and have a backup generator and you are not in that much danger. Unless your a lot older than I am Ft Myers wasn't been decimated in my life time. As I said I have been through six hurricanes in my lifetime. And I mean through as in the eye passed over me.
At no time did the home I was in suffer any real damage. Compared to earthquakes, tornadoes, flooding, or any of a number of other issues they are pretty rare, mild, and easy to plan for.

--
See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
Re:Note to self by Rick+Genter · 2008-06-24 13:45 · Score: 2, Funny

an invading country which the one superpower of the world already hates?
China hates North Korea?

--
Don't underestimate the power of The Source
Re:Note to self by BBandCMKRNL · 2008-06-24 13:53 · Score: 1

Why did he build it in a Stage 1 evacuation zone?
That is an avoidable problem. My guess is they didn't know the location was in the Stage 1 evacuation zone.
But before you get all bent. Bent? I was simply responding to your previous comment:
They don't have issues with flooding that I know about... to indicate that parts of Tampa Bay do have flooding issues. That's all.

--
Without the 2nd Amendment, the others are just suggestions.
Re:Note to self by Bodrius · 2008-06-24 15:58 · Score: 1

Can you post an example?

--
Freedom is the freedom to say 2+2=4, everything else follows...
Re:Note to self by Ma8thew · 2008-06-24 18:41 · Score: 1

I was actually joking, but thankyou for informing my humorous remark.
Re:Note to self by Anonymous Coward · 2008-06-25 02:00 · Score: 0

George W. is that you?
Re:Note to self by dave420 · 2008-06-25 02:08 · Score: 1

Two centres dangerously close to water, and another dangerously close to a nuke-wielding dictatorship. Sounds solid to me!
Re:Note to self by LWATCDR · 2008-06-25 03:33 · Score: 1

Hey you put a critical facility an a zone one evac zone then it is your own fault. Zone one means that it is very close to the water.
Also you never get 24 hours notice. The NHC has a great site. If you are in Florida you watch it. It is very rare that you ever get less than 72 hours notice and of all the storms in the last 20 years that have hit Florida there was at least 48 hours of warning. The problem is way to many people take a look at the projected path and ignore the cone. If you are the cone you are in the path as far as planning goes.
So your boss located a critical facility in a Stage one evacuation zone and had not disaster plan at all.
There is no place on this planet that is secure enough for that level of planning.
Florida really isn't any worse of a state to locate a data center in disaster wise then any other state.

--
See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.

More importantly by wolf12886 · 2008-06-24 05:36 · Score: 5, Interesting

I don't care how few servers they have, whats more interesting to me is that they run an ultra-high traffic site, which they aren't having trouble paying for, and do it without adds.

Re:More importantly by Anonymous Coward · 2008-06-24 08:04 · Score: 0

There's some element of their hosting that used to be donated. That would remove a sizable chunk from their bottom line.
Power, data, rent, etc..
Re:More importantly by Anonymous Coward · 2008-06-24 08:16 · Score: 0

> do it without adds.
Do it without adding what?

What is the role of Open Source by bogaboga · 2008-06-24 05:38 · Score: 1

I wonder how much of a role open source software is playing in Wikipedia's operations. How much is it? Anyone in the know?

Re:What is the role of Open Source by KokorHekkus · 2008-06-24 05:46 · Score: 4, Interesting

The wiki software, MediaWiki, was written for Wikipedia and is licensed under the GPL ( http://www.mediawiki.org/wiki/How_does_MediaWiki_work%3F. According to Wikipedia they use MySQL as their database and run it all on Linux servers.
Re:What is the role of Open Source by Anonymous Coward · 2008-06-24 06:12 · Score: 0

MediaWiki BTW, is pretty great. I was just handed a task of setting up organization wide Wiki and I found it very easy to setup and customize. It runs fast with memcached and eaccelerator.
Re:What is the role of Open Source by guruevi · 2008-06-24 07:02 · Score: 2, Insightful

I don't know what else but open source you could use especially on the database side. You have only a few choices:
Microsoft ($$$) (approx. $50,000 per server per year in licensing costs since it's a public (unlimited CAL) enterprise-level site)
IBM ($$) (approx. $500,000 per year for leasing the whole operation, another load for support)
Oracle ($) (approx. $20,000 per backend and about 30 contractors for the next 5 years for the implementation)
Linux, MySQL, PHP (Free)
Not to mention, with Microsoft you'll need more servers to handle the same amount of load especially if you use Microsoft-based software package for the frontend as well (ASP.NET, MS CRM or SharePoint).
For IBM you'll have special hardware that nobody can handle but IBM certified support personnel.
For Oracle you're pretty much on your own anyway and you'll have to find a frontend.

--
Custom electronics and digital signage for your business: www.evcircuits.com
Re:What is the role of Open Source by Simetrical · 2008-06-24 07:12 · Score: 1

Essentially all software used in the entire process of serving a web page to the user is free and open-source. The servers all run Linux; the wiki software is MediaWiki; the web servers are Apache and lighttpd; reverse proxying is done by Squid; the database is MySQL; search is done by Lucene; programming languages used in various places are PHP, Python, C, and C++; dynamic functionality is normally done using JavaScript, not Java/Flash/etc.; and so on. Non-free software is only used if there's a good reason to do so, and in the web server world, there generally isn't.

There is some non-free software used, however. The version of Lucene used is written in Java, which is not quite open-source yet (or at least I don't think the version used is). I think I've heard that the routers run non-free software. Some user-made tools that are loaded with every page run on the toolserver, which doesn't share the main project's open-source commitment and mostly runs Solaris; the tools themselves may also not necessarily be open-source.

--
MediaWiki developer, Total War Center sysadmin
Re:What is the role of Open Source by Carnildo · 2008-06-24 07:25 · Score: 1

I wonder how much of a role open source software is playing in Wikipedia's operations. How much is it? Anyone in the know?
I'm not aware of any software that Wikipedia uses that isn't open-source. They've got a very strong commitment to the free-content movement -- sometimes a little too strong: the only sound format they accept is Ogg Vorbis, the only video format Ogg Theora

--
"They redundantly repeated themselves over and over again incessantly without end ad infinitum" -- ibid.
Re:What is the role of Open Source by cmdrbuzz · 2008-06-24 07:36 · Score: 1

If you wrote this with an Oracle DB you'd most likely use App Express as the frontend. And it scales pretty well (Look at AskTom and Metalink for examples)
And I think you're exaggerating a little with the 30 contractors and 5 years..... By around 25 contracters and 4 1/2 years....
Re:What is the role of Open Source by David+Gerard · 2008-06-24 07:50 · Score: 1

It's OSS all the way through as far as is reasonably possible. Some bits are written in Java (the Lucene search), a couple of Toolserver machines run Solaris, there's a lot of Macs in the WMF office.

--
http://rocknerd.co.uk
Re:What is the role of Open Source by BitZtream · 2008-06-24 10:25 · Score: 1

Okay, I'll bite.
You don't need unlimited CALs, the users aren't connecting to the database software the server is. And its not connecting for every page.
Can't speak for db2
If you're using 30 contractors and 5 years to implement an Oracle database please stop posting newspaper ads to hire your staff. Might I suggest starting off by knowing a little about DBs so you can hire a proper Oracle db.
MSSQL, sadly, now days is rather speedy in a proper large scale setup where its meant to be used. I hate to admit it just as much as any other self respecting unix lover, but its not as crappy as you'd like to make us think.
If you think you need ASP.NET to talk to MSSQL you aren't a developer. MS CRM is a 'Customer Relations Manager' and has absolutely nothing to do with this discussion. SharePoint is for sharing internal office documents for the most part and again has no relation to this discussion. For that matter, SharePoint really has no business existing, but thats another discussion.
IBM is still in business?
You can easily use MediaWiki with Oracle or MSSQL if you're willing to fixup the SQL queries for it, most of them would work out of the box, the database creation scripts would need some fixing and the actually statements used at runtime would need some minor changes I'm sure, but you could use them. MediaWiki/wikipedia doesn't depend on some outstanding feature that only MySQL has, mostly because MySQL has no such feature. Its fast as well, but simple. Properly configured, Oracle and MSSQL should be able to keep up, or close to it with the same type of queries, properly optimized for them.
You can run php and mysql on linux or windows. You can hook wikimedia up to postgres out of the box on linux and windows, with php on apache or iis or lighttpd or any other fastcgi capable server.
Now that I've typed all this ... do you actually know anything about these databases or software development? If you're going to be a fanboy or zealot, at least get a damn clue first, you make the rest of us look bad.

--
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
Re:What is the role of Open Source by Anonymous Coward · 2008-06-29 04:43 · Score: 0

Wikipedia will also accept .ogg dirac video (don't think it accepts .ogv extensions yet)

Out like a light by Joebert · 2008-06-24 05:40 · Score: 1

300 servers housed in a single data center in Tampa, Fla.

Did Wikipedia go down when hurricanes Chralie/etc came through a few years ago ?
I lost power for about a week when that happened and I only live about 15 miles from Tampa, right over the Courtney Campbel Causeway actually.

--
Wanna fight ? Bend over, stick your head up your ass, and fight for air.

Re:Out like a light by timstarling · 2008-06-24 06:39 · Score: 2, Informative

We've never lost external power while we've been at Tampa, but if we did, there are diesel generators. Not that it would be a big deal if we lost power for a day or two. There's no serious problem as long as there's no physical damage to the servers, which we're assured is essentially impossible even with a direct hurricane strike, since the building is well above sea-level and there are no external windows.
Re:Out like a light by Joebert · 2008-06-24 06:52 · Score: 1

Well then, I guess we all know where I'm going next time a hurricane rolls through. :)

--
Wanna fight ? Bend over, stick your head up your ass, and fight for air.

What amazes me... by Anonymous Coward · 2008-06-24 05:42 · Score: 0

What amazes me is that not only they manage all this traffic on such a small infrastructure, but even with them being on the front page of /. the site is still up.

Re:What amazes me... by c0ol · 2008-06-24 05:54 · Score: 1

Seriously? Slashdot is not even a blip on their traffic...
Re:What amazes me... by ceejayoz · 2008-06-24 06:08 · Score: 4, Interesting

Slashdot is great at taking down sites on crappy shared hosting, but anything with a decently configured dedicated server will likely survive just fine.
Wikipedia's probably getting hit with hundreds of times the traffic Slashdot is at all times.
Re:What amazes me... by Doug+Neal · 2008-06-24 06:22 · Score: 1

Correct
Re:What amazes me... by quanticle · 2008-06-24 07:00 · Score: 1

To be quite honest, I'd say that the Slashdot surge is probably a drop in the bucket as far as Wikipedia is concerned. I mean, they're the top result for loads of Google queries, and plenty of people go straight to Wikipedia when they need to look something up.

--
We all know what to do, but we don't know how to get re-elected once we have done it
Re:What amazes me... by HarvardAce · 2008-06-24 07:19 · Score: 2, Insightful

(link to Alexa graph) One problem about Alexa is that it only gathers statistics from those who install the Alexa toolbar...I would tend to think that the Slashdot crowd would be a group that predominantly avoids installing that sort of thing. I actually think there was a discussion on this on Slashdot many months ago.

That said, I'm sure that the traffic to Wikipedia is probably several orders of magnitude higher than that of Slashdot.

--
Note to self: Stop putting jokes in my insightful comments so I can get something other than +1 Funny!
Re:What amazes me... by Simetrical · 2008-06-24 07:24 · Score: 1
It's probably more illuminating to look at those separately:
- Slashdot reach: ~0.03% per day
- Wikipedia reach: ~10% per day
Wikipedia gets 300 times the traffic that Slashdot does, according to Alexa. And that doesn't even count the sister projects. wikimedia.org gets 0.6% reach, 20 times Slashdot. Slashdot isn't even up to some of the small projects like Wiktionary and Wikibooks. To quote the Wikimedia Bugzilla's quips list,

Xirzon: are the servers up for slashdotting ? brion: we get more traffic than /. usually we don't even notice the bump on the traffic graphs
--
MediaWiki developer, Total War Center sysadmin
Re:What amazes me... by dubl-u · 2008-06-24 07:38 · Score: 3, Insightful

Slashdot is great at taking down sites on crappy shared hosting, but anything with a decently configured dedicated server will likely survive just fine. Sounds right to me. I don't have any terribly recent data on a slashdotting, but I think the Slashdot-as-server-killer meme is pretty stale.
Looking at some old data and extrapolating, I'd guess a modern slashdotting would peak at 200 pageviews/min, or ~3 pv/sec. Get mentioned on Good Morning America or Oprah, on the other hand, and you're looking at 20-200 pageviews/sec. I'd guess that getting on Digg's front page is somewhere in the 20-40 pv/sec range.
A slashdotting was a big deal back when every nerd used it and the Internet was mainly nerds. Neither is true anymore.
Re:What amazes me... by David+Gerard · 2008-06-24 07:51 · Score: 1

Wikipedia: the site that slashdots itself!

--
http://rocknerd.co.uk
Re:What amazes me... by Lazy+Jones · 2008-06-24 10:34 · Score: 1

Looking at some old data and extrapolating, I'd guess a modern slashdotting would peak at 200 pageviews/min, or ~3 pv/sec.
I doubt that, a typical Googlebotting does more than that...

--
"I love my job, but I hate talking to people like you" (Freddie Mercury)
Re:What amazes me... by dubl-u · 2008-06-24 19:16 · Score: 1

I doubt that, a typical Googlebotting does more than that... Doubt all you want, but that's basically my point.
The Slashdot I had data on was from a few years back, but it was circa 100 pageviews/min. From what I can tell, Slashdot's traffic has been pretty steady the last few years. So I doubled the one I had data on. I did a little more rummaging, and as far as I can tell, my point stands: Slashdot has nothing on Oprah.

Off-topic, I know, but...what about /.'s hardware? by kiwimate · 2008-06-24 05:44 · Score: 5, Interesting

I.e. the promised follow-up to this story about moving to the new Chicago datacenter? You know, the one where Mr. Taco promised a follow-up story "in a few days" about the "ridiculously overpowered new hardware".

I was quite looking forward to that, but it never eventuated, unless I missed it. It's certainly not filed under Topics->Slashdot.

Tampa? by QuietLagoon · 2008-06-24 05:44 · Score: 0

300 servers housed in a single data center in Tampa, Fla.

Does anyone see the lack of planning that resulted in the placement of a major data center in the thunderstorm and lightning-strike capitol of the world?

Re:Tampa? by nickull · 2008-06-24 05:46 · Score: 2, Funny

Not to mention hurricanes and faulty electronic voting machines.... ;-)

--
"Question everything, including this!" - http://technoracle.blogspot.com/
Re:Tampa? by midom · 2008-06-24 07:00 · Score: 2, Informative

add power costs, difficulty to travel to, possible flooding, etc. it is all historic reasons, we can't just migrate datacenters at wish - that requires quite a high investment. and the datacenter choice was simply because the founder lived there in 2001, when all we needed was single server. --Domas
Re:Tampa? by Anonymous Coward · 2008-06-24 17:42 · Score: 0

Wikipedia doesn't have any one founder and the hosting was located in Florida as of 2004.
Re:Tampa? by midom · 2008-06-25 05:13 · Score: 1

because Jimmy moved there back then?

Works great because it's not "Web 2.0" by Animats · 2008-06-24 05:45 · Score: 5, Insightful

Most of Wikipedia is a collection of static pages. Most users of Wikipedia are just reading the latest version of an article, to which they were taken by a non-Wikipedia search engine. So all Wikipedia has to do for them is serve a static page. No database work or page generation is required.

Older revisions of pages come from the database, as do the versions one sees during editing and previewing, the history information, and such. Those operations involve the MySQL databases. There are only about 10-20 updates per second taking place in the editing end of the system. When a page is updated, static copies are propagated out to the static page servers after a few tens of seconds.

Article editing is a check-out/check in system. When you start editing a page, you get a version token, and when you update the page, the token has to match the latest revision or you get an edit conflict. It's all standard form requests; there's no need for frantic XMLHttpRequest processing while you're working on a page.

Because there are no ads, there's no overhead associated with inserting variable ad info into the pages. No need for ad rotators, ad trackers, "beacons" or similar overhead.

Re:Works great because it's not "Web 2.0" by Anonymous Coward · 2008-06-24 05:53 · Score: 0

+1 Insightful. I would do the mod myself, if I could just find those darn mod points I had last week...
Re:Works great because it's not "Web 2.0" by internic · 2008-06-24 06:11 · Score: 1

Oh really? Because O'Reill seems to think it is, and I thought he was the main pusher of this terminology. Is the term Web 2.0 actually meaningful?

--
"You call it a new way of thinking; I call it regression to ignorance!" -- Operation Ivy
Re:Works great because it's not "Web 2.0" by Anonymous Coward · 2008-06-24 06:17 · Score: 1, Informative

There is practically no such thing as a static page in Wikipedia. We're running 2 small Wikipedia mirror clusters, and It's quite obvious that if you don't run a memcached along with the apache, that all pages are rendered from the Database on demand and for every single request. Large and complex pages (e.g. on Hydrogen or Gold) take more than 1 second to render even on the fastest CPUs available.
You make things sound cheap and simple, but without the memcached and the squid clusters Wikipedia is using, the whole thing would require significantly more hardware than the foundation could afford.
Re:Works great because it's not "Web 2.0" by Anonymous Coward · 2008-06-24 06:20 · Score: 0

The search box is using XMLHttpRequest for a context-sensitive combo dropdown now.
Re:Works great because it's not "Web 2.0" by Tweenk · 2008-06-24 06:38 · Score: 2, Informative

If you haven't noticed, "Web 2.0" is a long estabilished buzzword - which means it carries little meaning, but it looks good in advertising. Just like "information superhighway", "enterprise feature" or "user friendly".

--
Those who would give up liberty to obtain working drivers, deserve neither liberty nor working drivers.
Re:Works great because it's not "Web 2.0" by adri · 2008-06-24 23:35 · Score: 1

Its not a big deal if they wanted to introduce dynamic ad type content at the edge. Oh it would be - we'd have to finish implementing ESI in a useful fashion (the current implementation in Squid-3 is not usable.) You just have to know what you're doing and be willing to learn about caching. Caching isn't evil, honest. :)
They've built a scalable solution to their problem space. People should really sit down and define their problem and solution spaces before they build things.
Adrian
(I'm one of the Squid developers.)

I hate web sites that are broken on purpose! by Anonymous Coward · 2008-06-24 05:45 · Score: 0

How the hell are we supposed to read the text with an ad hiding the text? What idiot decided that it was a good decision to go to the hard work to create content only to hide it?

Confused by the title by Just+Some+Guy · 2008-06-24 05:52 · Score: 5, Insightful

What does "Non-Profit Budget" mean, anyway? There are non-profits bigger than the company I work for. Non-profit isn't the same as poorly financed.

--
Dewey, what part of this looks like authorities should be involved?

Re:Confused by the title by perbu · 2008-06-24 06:28 · Score: 1

I guess tt means thats there the budget is not necessarily scaled in the same way it might have been if they where a commercial company. In a commercial company more traffic means more money - not so for WP.
Re:Confused by the title by quanticle · 2008-06-24 07:19 · Score: 2, Interesting

Good point. Perfect example: the Bill and Melinda Gates Foundation has a budget of billions of dollars, easily exceeding the budget of many private corporations.

--
We all know what to do, but we don't know how to get re-elected once we have done it
Re:Confused by the title by dubl-u · 2008-06-24 08:03 · Score: 1

What does "Non-Profit Budget" mean, anyway? There are non-profits bigger than the company I work for. Non-profit isn't the same as poorly financed. But it generally means there's little connection between demand and payment. For-profit businesses generally only do things that they expect will make them money. So commercial web sites try to serve pages that get them something.
Non-profits, on the other hand, try to fulfill a mission with whatever resources they have at hand. Demand almost always exceeds supply, and many can't or won't charge to bring them back in balance. This leads to endemic under-funding and miserly budgeting.
Web sites are especially prone to this. They can't control demand, and unlike most non-profit services, they can't limit supply. So you make do. Or you deliver with lowered quality, which Wikipedia has had fits of.
Re:Confused by the title by corbettw · 2008-06-24 12:08 · Score: 1

Good point. Perfect example: the Bill and Melinda Gates Foundation has a budget of billions of dollars, easily exceeding the budget of many small countries. FTFY.

--
God invented whiskey so the Irish would not rule the world.

Link to wikipedia? by Luyseyal · 2008-06-24 05:54 · Score: 4, Funny

The summary was wrong to include a link to the Wikipedia homepage without a Wikipedia link about Wikipedia in case you don't know what Wikipedia is. I myself had to Google Wikipedia to find out what Wikipedia was so I am providing the Wikipedia link about Wikipedia in case others were likewise in the dark regarding Wikipedia.

-l

P.s., Wikipedia.

--
Help cure AIDS, cancer, and more. Donate your unused computer time to worldcommunitygrid.org. Join Team Slashdot!

Re:Link to wikipedia? by Chaotic+Spyder · 2008-06-24 06:36 · Score: 1

you must be new here

--
Losers whine about their best, Winners go home to fuck the prom queen
Re:Link to wikipedia? by hansamurai · 2008-06-24 06:53 · Score: 4, Funny

Wait, what's this Google thing you're talking about?

--
Reviewing just the first hour of video games.
Re:Link to wikipedia? by hansamurai · 2008-06-24 06:56 · Score: 5, Funny

Nevermind, found it:
http://www.google.com/search?q=google

--
Reviewing just the first hour of video games.
Re:Link to wikipedia? by felipekk · 2008-06-24 07:01 · Score: 2, Funny

http://en.wikipedia.org/wiki/Google
Re:Link to wikipedia? by srollyson · 2008-06-24 07:19 · Score: 2, Funny

[citation needed]
Re:Link to wikipedia? by Joeyspecial · 2008-06-24 09:08 · Score: 1

That's pretty cool that non-goole page that comes up when you search for google is the wikipedia page for google? That's probably why you weren't modded off-topic.
Re:Link to wikipedia? by whathappenedtomonday · 2008-06-24 09:19 · Score: 1

wow, 2,600,000,000 hits - I must have missed a major hype here, brb...

--
I hope I didn't brain my damage.
Re:Link to wikipedia? by tehcyder · 2008-06-25 01:57 · Score: 1

Nah, that site looks really plain and boring, it will never compete with the likes of Yahoo! or MSN. I mean, even slashdot has some video ads on, they need to try harder.

--
To have a right to do a thing is not at all the same as to be right in doing it

Distributed computing? by Bombula · 2008-06-24 05:55 · Score: 1

I'm kind of surprised there's not been more talk about a distributed computing effort for wikipedia. Seems like it would be a good candidate. I'm more of an honorary geek than an actual hardcore tech-savvy person - does anyone know if a distributed computing effort could work? I don't really see any problem with data integrity, since it's not confidential and is open to editing by definition (except maybe user info?), so it'd basically be a big assymetric RAID, right? I would worry more about it having fast enough response times - but maybe even that isn't so much of an issue given the nature of wikipedia's content. I suppose syncing the data as it gets edited would be the biggest issue... But what do I know?

Thoughts, everyone?

--
A-Bomb

Re:Distributed computing? by Tweenk · 2008-06-24 06:53 · Score: 1

The problem is that there's not much to compute at Wikipedia. The limiting factor is bandwidth. A distributed web cache like Coral Cache might work, but this generally isn't called distributed computing, just like P2P networks aren't. The main problem would be that web caches have high update latency, but probably it wouldn't matter too much on Wikipedia.

--
Those who would give up liberty to obtain working drivers, deserve neither liberty nor working drivers.
Re:Distributed computing? by TheRaven64 · 2008-06-24 22:37 · Score: 1

Something like FreeNet would work well for Wikipedia, since it caches the most commonly accessed pages in a large number of nodes. FreeNet isn't designed to be searched, however, so could not be used directly.

--
I am TheRaven on Soylent News

Simplicity by wsanders · 2008-06-24 06:01 · Score: 5, Interesting

Although much of the Mediawiki software is a hideous twitching blob of PHP Hell, the base functionality is fairly simple and run perpetually and scale massively as long as you don't mess with it.

What spoils a lot of projects like this is the constant need for customization. Wikimedia essentially can't be customized (except for plugins obviously, which you install at your own peril) and that is a big reason why it scales so massively.

As for Wikipedia itself, I suspect it is massively weighted in favor of reads. That simplifies circumstances a lot.

--
Give a man a fish and you have fed him for today. Teach a man to fish, and he'll say "WHERE'S MY FISH, YOU IDIOT?"

Re:Simplicity by rrohbeck · 2008-06-24 08:24 · Score: 1

And notice that the vast majority of pages is semi-static and served from squid caches. Ideally, they only have to go back to PHP when a page changes.

--
thegodmovie.com - watch it
Re:Simplicity by Anonymous Coward · 2008-06-24 08:38 · Score: 0

Bah, this simply isn't even close to true. Take a look at Wikimedia's different projects. Some of them have drastically different technical needs. Take a look at some of the extensions used on the different sites. Just look at Wikisource's Special:Version page for a good list of extensions being used.
Also, take a look at Wikia, MetaVid Wiki, or WikiHow for a good example of how different MediaWiki driven sites can be.
MediaWiki is very extensible, and there isn't much danger in installing most/all of the extensions that are installed on any of the major sites.

Cached on servers all over the interweb? by ClarisseMcClellan · 2008-06-24 06:01 · Score: 1

In the early days of the WWW the idea with popular pages was that they could be cached all over the internet. Your server checks with their server and if it has the page in cache already then that is what gets served up. What happen to that idea and why cannot Wikipedia work like that with only obscure and new pages getting served up from Florida?
Those 300 servers are one of the wonders of the world and if you have never made an edit then you should. There must be something you can add to the whole.
There has been much talk of other encyclopaedias but I am still waiting.

Re:Cached on servers all over the interweb? by IamTheRealMike · 2008-06-24 06:40 · Score: 1

Lots of ISPs run transparent caching proxy servers so wikipedia could be cached if they wanted. They set their headers to prevent that though, presumably so changes show up immediately.
Re:Cached on servers all over the interweb? by TheRaven64 · 2008-06-24 22:45 · Score: 1

For something Wikipedia, you really want a lightweight invalidation protocol, where each URL has a version associated with it, and the proxy can send a single packet query to find the current version of it. If the current version is the cached version then it would serve that, otherwise it would pass the request on. You could maybe integrate this into HTTP by having a header which specifies the version you have in your cache, and having it either return the newer version (with a header indicating its version number) or a simple reply code indicating that you should use your cache.
Actually, this kind of thing would be good for browsers too, since it would allow them to use less bandwidth fetching large images and so on that they already have in their caches. I wonder why it hasn't been implemented.

--
I am TheRaven on Soylent News
Re:Cached on servers all over the interweb? by adri · 2008-06-24 23:39 · Score: 2, Informative

It exists. Its called "validators". There are strong and weak validators. You can Vary on your validators, and thus have multiple copies of the same object but in different forms (so given a text document, you can have it in different languages, compressed/uncompressed, etc.)
Your browser will then quite happily ask the origin server (which may not be the "origin" origin) for an object and provide validators. (Last-Modified -> If-Modified-Since; ETag->If-None-Match) which the origin (or the cache which is pretending to be the origin) can check against its local copy and then return a "yes, use your local copy" or "no, don't bother."
Its all there, right now, in HTTP/1.1. I swear. People just don't have a clue how to use caching, they've been bitten by the difference between "expiry" and "revalidation", and they just turn off all hope of caching. Maybe they're scared; maybe their job is to sell bits; maybe they're just clueless about it and turning off caching fixed an obscure problem. In any case, its right there in HTTP/1.1 and you can use it any time you like.
Adrian
(I'm a Squid developer.)

Sure they do it without ads... by DerekLyons · 2008-06-24 06:20 · Score: 3, Informative

Sure they do without ad income. But they also do it without having to pay salaries, or co location fees, or bandwidth costs... (I know they pay some of those, but they also get a metric buttload of contributions in kind.)

When your costs are lower, and your standard of service (and content) malleable, it is easy to live on a smaller income.

Re:Sure they do it without ads... by quanticle · 2008-06-24 06:55 · Score: 1

But they also do it without having to pay salaries, co location fees, or bandwidth costs...
Well, as far as salaries go, yeah, they don't have to pay for a full team of developers and administrators for the business, but they do need to pay people to go and check on the servers, replace faulty hardware, etc. Also, as far a co-location costs go, I'd say that running your own data center (i.e. providing your own electricity, cooling, backup power supplies, etc.) can't be cheap either.

--
We all know what to do, but we don't know how to get re-elected once we have done it

so what's "Web 2.0"? by gbjbaanb · 2008-06-24 06:22 · Score: 1

I take it that "Works great because it's not "Web 2.0" " means its fast and dynamic, whereas Web 2.0 generally means slow and dynamic.

The technology behind it is irrelevant, if content is provided by users then its web 2.0 (as I understan the term), so Wikipedia definitely is web 2.0, its just that they have some fancy caching mechanism to get the best of both worlds. If only more systems were built in a pragmatic way instead of worrying about what its "supposed" to be.

Re:so what's "Web 2.0"? by quanticle · 2008-06-24 07:09 · Score: 1

I take it that "Works great because it's not "Web 2.0" means that its fast and dynamic, whereas Web 2.0 generally means slow and dynamic.
Web 2.0 is a shorthand version of saying "dynamic pages served using Asynchronous JavaScript and XML (AJAX)". Now, if you reread the parent, you'll see that he says:
Most of Wikipedia is a collection of static [emphasis mine] pages. Most users of Wikipedia are just reading the latest version of an article... So all Wikipedia has to do for them is serve a static page.
In other words, the parent is saying that Wikipedia is effective because avoids any sort of dynamism for the majority of use cases. Heck, even article editing isn't dynamic on Wikipedia. When you click the edit link, you're taken to a separate page which has a prepopulated form with the wikitext of the article. The only bit of dynamic content on Wikipedia I can remember is the new search box, which uses a bit of AJAX to generate autocomplete possibilities.

--
We all know what to do, but we don't know how to get re-elected once we have done it

Nonsense. Wikipedia is THE web 2.0 by Nicolas+MONNET · 2008-06-24 06:23 · Score: 4, Insightful

Web 2.0 is not just about flashy Ajax or what not, it's about user generated dynamic content. WP's "everything is a wiki" architecture might /look/ a bit archaic compared to fancy schmancy dynamic rotating animated gradient-filled forums, but it's much more powerful.
Moreover, WP is not a collection of static pages, if you're logged in at least, every pages is dynamically generated, and every page's history is updated within a few seconds.

Re:Nonsense. Wikipedia is THE web 2.0 by quanticle · 2008-06-24 07:14 · Score: 1

Moreover, WP is not a collection of static pages, if you're logged in at least, every pages is dynamically generated, and every page's history is updated within a few seconds.
That's not how it works. If you're just browsing Wikipedia, you're just looking at a collection of static pages that were generated earlier and cached. Only when you actually edit the page and save it is the page updated.

If Wikipedia had to freshly create every page for every user, even computational power on the order possessed by Google wouldn't be up to the task.

--
We all know what to do, but we don't know how to get re-elected once we have done it
Re:Nonsense. Wikipedia is THE web 2.0 by dave420 · 2008-06-25 02:16 · Score: 1

Nope, Web2.0 is a buzzword that means absolutely nothing. User-generated content has been on the web as long as the web has existed.

Wikipedia = much more traffic than slashdot by Anonymous Coward · 2008-06-24 06:28 · Score: 5, Interesting

Slashdot does .. what? 40 mbit of traffic at peak? Wikipedia
is roughly 100 times larger. (And WP has three datacenters, not one)

Slashdot traffic hasn't created noticeable blips on Wikipedia's radar for years.

OTOH, if Wikipedia linked slashdot on every page slashdot would go down, if do to nothing else but bandwidth exhaustion.

Re:Wikipedia = much more traffic than slashdot by hostyle · 2008-06-24 07:38 · Score: 5, Funny

OTOH, if Wikipedia linked slashdot on every page slashdot would go down, if do to nothing else but bandwidth exhaustion.
Sounds like a dare to me. Gentlemen, start your packets!

--
Caesar si viveret, ad remum dareris.
Re:Wikipedia = much more traffic than slashdot by Haoie · 2008-06-24 09:39 · Score: 2, Informative

That's pretty obvious because Wiki has, literally, millions of topics covering every possible field. Whereas /. is very limited in scope.

--
If each mistake being made is a new one, then progress is being made.
Re:Wikipedia = much more traffic than slashdot by beav007 · 2008-06-24 13:45 · Score: 3, Funny

bandwidth exhaustion
Welcome to ************ broadband tech support. How can I help?

"My internet is running very slowly tonight. Why is that?"

Well sir, it looks like you've been downloading from the other side of the continent. I'd say that your packets are just very tired by the time they reach you...
Re:Wikipedia = much more traffic than slashdot by BooRolla · 2008-06-24 14:27 · Score: 5, Funny

If only there were some way to put links on to Wikipedia!
Re:Wikipedia = much more traffic than slashdot by hcdejong · 2008-06-24 22:14 · Score: 1

Slashdot does .. what? 40 mbit of traffic at peak? Wikipedia is roughly 100 times larger. Apples and oranges. The question is not 'how much traffic do the /. servers handle' but 'how much (peak) traffic on Wikipedia is generated by /. users'.

Re:Off-topic, I know, but...what about /.'s hardwa by larry+bagina · 2008-06-24 06:32 · Score: 1

Remember when CmdrTaco called wikipedia a fad and said they couldn't scale? It was during the last (only?) slashdot IRC "interview" a few years back. Just before wikipedia overtook /. in traffic.

--
Do you even lift?

These aren't the 'roids you're looking for.

Servers and locations by Anonymous Coward · 2008-06-24 06:40 · Score: 2, Informative

According to http://meta.wikimedia.org/wiki/Wikimedia_servers Wikimedia (and by extension, Wikipedia):

"About 300 machines in Florida, 26 in Amsterdam, 23 in Yahoo!'s Korean hosting facility."

also: http://meta.wikimedia.org/wiki/Wikimedia_partners_and_hosts

It's easy... by CarpetShark · 2008-06-24 06:43 · Score: 1

If wikipedia is anything to go by, you just don't include a decent search engine.

Re:It's easy... by Hillgiant · 2008-06-24 08:23 · Score: 3, Insightful

Why? If you want search, go to google. If you want an encyclopedia, go to wikipedia. Its pretty simple, really.

--
-
Re:It's easy... by xSauronx · 2008-06-24 09:01 · Score: 1

hell, if you search google, wikipedia articles are often in the top 10 anyway.

--
By and large, language is a tool for concealing the truth. -- George Carlin
Re:It's easy... by CarpetShark · 2008-06-24 09:26 · Score: 1

This is true enough, but you can never really trust google's index to be complete and up to date for any particular website. Plus, it's a generic search engine, which probably misses a lot of specialised features that a custom-built search engine for a site like wikipedia could have -- consideration of whether a word is just in the page, or is named on the page as a category that article belongs to, for instance.
Re:It's easy... by Hillgiant · 2008-06-24 10:59 · Score: 1

My point precisely.

--
-
Re:It's easy... by Hillgiant · 2008-06-24 11:02 · Score: 1

90% of my search needs are met by typing http://en.wikipedia.org/wiki/[thing i want to know about] in the address bar. For the rest, google does nicely. I am not a hard core reference geek, so... i dunno.

--
-
Re:It's easy... by Res3000 · 2008-06-24 22:46 · Score: 1

I usually just type in Firefox 'wiki [thing I like to know]' (the same as typing it into the Google search field and clicking "I'm feeling lucky") and it gets me directly to the wiki page.

moral: it's easy to be third rate by Anonymous Coward · 2008-06-24 06:46 · Score: 0

1. Millions of static pages can be served at a very high rate from a single modern server.

2. Editing is basically (a) get token (b) edit page (c) submit revisions with token (d) hope you didn't conflict with someone else's edits, in which case you've got to manually fix things.

3. Lack of in-order human oversight. Wikipedia is powered by a gaggle of zealots, not organised humans, and the rule is "latest change produces current page". That's way more easy to implement than a system which involves some sort of review process.

4. Wikipedia operates like a religion with volunteer ministers and one charismatic leader. To paraphrase Bush, it's a whole lot easier to run a group when there's just one dictator and everyone's working toward his whims. "Lowest common denominator fits all" is very easy to engineer but rarely produces progress.

5. Because Wikipedia is operated as a religion rather than a business or charity, no-one gets hurt (except the charismatic leader) if there's data loss or failure, and volunteers are very tolerant of what they're given. It's unnecessary to implement the kind of safeguards to financial loss that any site of Wikipedia's site would normally have to implement.

In other news, a modern desktop can have n people logged in simultaneously typing `less ObjectivismIsAboutFreeWorkers.txt' while another n/100 are in the middle of `vi ObjectivismIsAboutFreeWorkers.txt'.

They were distributed at one time by Anonymous Coward · 2008-06-24 06:56 · Score: 0

This is not the first article on Wikipedia's infrastructure to grace Slashdot.

I seem to remember some data distribution (DB replicants) in other parts of the world.

I could be wrong!

It's called proxy server. by Tweenk · 2008-06-24 07:03 · Score: 1

In the early days of the WWW the idea with popular pages was that they could be cached all over the internet. Your server checks with their server and if it has the page in cache already then that is what gets served up. This is called "proxy server". Ask your ISP whether they have one. By taking this a bit further where multiple proxies can exchange data directly we have a distributed Web cache. See www.coralcdn.org for an example of that. It works on Wikipedia pages too.

--
Those who would give up liberty to obtain working drivers, deserve neither liberty nor working drivers.

Obviously if you're not in Silicon Valley by heroine · 2008-06-24 07:30 · Score: 1

Obviously U can pay much less outside Silicon Valley. If you want investment capital & lots of customers you have to be physically in Silicon Valley and pay the millions of dollars. Even Kiwipedia had to move its office to San Francisco & the data center is going to follow if they can get enough donations.

Re:Obviously if you're not in Silicon Valley by Anonymous Coward · 2008-06-24 07:53 · Score: 0

Learn how to spell the word 'you', dumbshit.

old article, but explains the process simply by rootpassbird · 2008-06-24 07:41 · Score: 1

http://www.goldmark.org/netrants/webstats/
Browser Cache
Local site cache
Local regional cache
Large regional cache

ummm.. by the way, you /could/ use mediawiki as a quick-and-dirty source code versioning system as long as there's only a few members in the team and/or the code is small - maybe a few ten thousand lines of code totally.

Wonderful history and diff built-in, web-access fit in documents wherever you want. Effective in certain situations.

--
Hackers have long memories. It works both ways.

Re:Off-topic, I know, but...what about /.'s hardwa by dubl-u · 2008-06-24 07:50 · Score: 1

Remember when CmdrTaco called wikipedia a fad and said they couldn't scale? Plenty of smart people said that. Even some people working on the project suspected that.

Wikipedia was in theory impossible, and unproven in practice. Even now, the main difference is that most people just accept that it works without understanding how.

What about the Internet Archive by Xtifr · 2008-06-24 07:51 · Score: 5, Informative

Wikipedia's pretty impressive, but how about the Internet Archive? Also a non-profit that doesn't run ads, and not only do they, like Google and Yahoo, "download the Internet" on a regular basis, but the Archive makes backups! Plus, they have huge amounts of streaming audio and video (pd or creative-commons). The first time I ever heard the word "Petabyte" being discussed in practical, real world terms (as in, "we're taking delivery next month") was in connection with the Internet Archive. Several years ago. And it was being used in the plural! :)

They may not have as much incoming traffic as Wikipedia, but the sheer volume of data they manage is truly staggering. (Heck, they have multiple copies of Wikipedia!) When I do download something from there, it's typically in the 80-150 MB range, and 1 or 2 GB in a pop isn't unusual, and I know I'm not the only one downloading, so their bandwidth bills must still be pretty impressive.

The fact that these two sites manage to survive and thrive the way they do never ceases to amaze me.

Wikipedia wastes disk space on over 116000 stubs by Anonymous Coward · 2008-06-24 08:00 · Score: 0

Wikipedia has a version in volapuk (a conlang with just 20 speakers), which has over 116,000 articles generated by a bot.

Nice article but- by skylinkdave · 2008-06-24 08:04 · Score: 0, Redundant

I wanted pictures :(

Re:Lego instructions online by againjj · 2008-06-24 08:29 · Score: 1

You posted to the wrong article. You meant to post to this one.

Re:Lego instructions online by rbeattie · 2008-06-24 08:31 · Score: 1

Dammit! Got my tabs confused. :-)

Thanks!

-Russ

--
Me

That's easier than it sounds by Cajun+Hell · 2008-06-24 08:32 · Score: 2, Funny

I don't care how few servers they have, whats more interesting to me is that they run an ultra-high traffic site, which they aren't having trouble paying for, and do it without adds.

I can do that too; I just emulate the adds. x+y is the same as x-(0-y). You have to be careful to use signed numbers for everything (or else have a lot of casting), but that's not really all that hard.

--
"Believe me!" -- Donald Trump

Article is wrong by Anonymous Coward · 2008-06-24 08:53 · Score: 0

Wikipedia has many sites besides FL: http://meta.wikimedia.org/wiki/Wikimedia_servers

that's what he means by static page by Trepidity · 2008-06-24 09:01 · Score: 1

MediaWiki doesn't literally generate static HTML pages because it doesn't need to, since it's designed to be used with the rest of the infrastructure. The "static pages" are the ones served by the squid clusters, which is simpler architecturally (and more distributed) than having the core software literally generate static HTML pages. And the vast majority of Wikipedia pageviews are these static pages served out of squids.

--
10 PRINT CHR$(205.5+RND(1)); : GOTO 10

the vast majority of people aren't logged in by Trepidity · 2008-06-24 09:03 · Score: 1

"Almost all" Wikipedia pageviews are cached static HTML served up by a squid proxy, because there are orders of magnitude more non-logged-in readers than logged-in users, and many orders of magnitude more reads than edits. Only a small minority of traffic hits the database at all.

--
10 PRINT CHR$(205.5+RND(1)); : GOTO 10

Internet archive is low traffic compared to Wiki by Anonymous Coward · 2008-06-24 09:49 · Score: 0

Wikipedia handles about 5x the number of mbit/sec of the Internet archive, and since Wikipedia's pages are tiny it takes Wikipedia a lot more work for every bit sent. Wikipedia also does it with something like 1/10th the budget. ... why Wikipedia only has 1/10th the budget is a problem left up to the reader.

Google Should Give Wikipedia $50 Million by Doc+Ruby · 2008-06-24 10:05 · Score: 1

Maps.Google.com now includes a Wikipedia article layer along with other layers like traffic, terrain, streets and satellites. The layer is referred to as "Wikipedia"; the articles are shown as Wikipedia's trademarked logo icon on the map. Click the an icon and its linked article content pops up right in the browser (or Google Earth, if that's the viewer you're using) window.

That's fair use of Wikipedia's open content, so Google isn't required to pay Wikipedia a license fee or anything. But Google is obviously getting a huge value out of including Wikipedia content in Google's app and UI, including the Wikipedia logo, for which Google is making $BILLIONS a year, and its place in the stock market protected by cobranding with Wikipedia. I see no sign that Google is paying Wikipedia for all that traffic Google gets paid for which Wikipedia must pay to support on Wikipedia's servers.

Google's Maps pages all say at their bottom "", but Wikipedia isn't even mentioned. The Where does Google Maps get its information? "Help" page credits NAVTEQ, TeleAtlas, DigitalGlobe and MDA Federal, but not Wikipedia. The detailed instructions on using the Wikipedia layer and others doesn't credit Wikipedia, just takes credit for exposing it.

That's an excellent feature of Google Maps, and probably completely blows away competitors like MapQuest and Yahoo Maps. Google should pay Wikipedia whatever it costs to operate the servers that are making Google so many $billions, and even more to keep Wikipedia the excellent resource that Google exploits so well. Probably at least $50 million a year would be good, and just another investment in Google's auxiliary infrastructure.

Or Google could just be evil and get it for free while millions of other people pay Google's tab.

--

--
make install -not war

Re:Google Should Give Wikipedia $50 Million by bobbozzo · 2008-06-24 11:02 · Score: 1

IIRC, Google (and Yahoo) are big contributors to the Wikimedia foundation.

--
Nothing to see here; Move along.
Re:Google Should Give Wikipedia $50 Million by Doc+Ruby · 2008-06-24 11:13 · Score: 1

I don't think so. The WikiMedia Foundation's benefactors show no sign of Google. Though Yahoo is listed as contributing hosting services in some unspecified amount.
The top benefactors give anywhere from $1M (for each of 3 years) to $100K (or some unspecified matching fund). I expect that if Google donated $50M, it would appear prominently on that page. Hell, if Google donated $100,000 it would be towards the top of that all too brief page.

--
--
make install -not war
Re:Google Should Give Wikipedia $50 Million by bobbozzo · 2008-06-24 18:26 · Score: 1

This is from 2005, but there's a Slashdot article titled "Google donating bandwidth and servers to Wikipedia".

--
Nothing to see here; Move along.
Re:Google Should Give Wikipedia $50 Million by TheRaven64 · 2008-06-24 22:51 · Score: 1

As I recall, Wikipedia turned down Google's offer because they thought it would harm their independence.

--
I am TheRaven on Soylent News

Single datacenter? by Anonymous Coward · 2008-06-24 10:14 · Score: 0

Wikipedia's infrastructure runs on fewer than 300 servers housed in a single data center in Tampa, Fla Err, no. There's also a cluster at Kennisnet in the Netherlands, and one provided by Yahoo in South Korea or so.

But then, this "article" was really one of the most pointless things I've read in a long time, anyway - all it consisted of were some numbers (interesting, admittedly, but not for more than a few seconds), a description of Wikipedia that sounds like it was written by a third-grader ("This is Wikipedia. Wikipedia runs on MySQL. Run, Wikipedia, run!"), and some links to actual presentations.

Why not cut out this middleman and directly link to those? Oh, wait, they're from last year, so this isn't even news. My bad.

Re:Internet archive is low traffic compared to Wik by Anonymous Coward · 2008-06-24 10:49 · Score: 0

Internet archive low traffic compared to Wiki?

{{fact}}?

mod parent up by Anonymous Coward · 2008-06-24 13:00 · Score: 0

+1

Isn't it sometimes about measuring unreliability? by jesterzog · 2008-06-24 13:01 · Score: 1

And if there is a 1-hours downtime, EVER, you just blew through the scheduled downtime for the next 120 years. "Six nines" is meaningless. Unrealistic.

Setting aside the arguments that maybe you can have that kind of uptime with certain setups and clauses, I did think that these requirements were also often so that it'd be clear to both sides when a sales company owed a user company some kind of compensation. I don't think either would expect it to be reliable, but it gives a measurement system for deciding just how much money as owed, as long as there's agreement on how to interpret it. ie. If it's down for longer than 30 seconds this year, B will pay A ((#seconds-30) * $some-amount). If it's down for longer than 5 minutes, perhaps they'll switch to a different scale.

I think you'd find that a lot of companies are prepared for the system not to live up to that particular requirement, no matter which side of the deal they're on.

Where are the PHP/MySQL doom criers? by trawg · 2008-06-24 13:48 · Score: 2, Insightful

I notice they are conspicuously absent in the comments. They tend to jump up and down in any other post about PHP and MySQL. This is such a great example of the scalability and performance of it WHEN USED CORRECTLY.

Re:Where are the PHP/MySQL doom criers? by dark_banishing · 2008-06-24 20:55 · Score: 1

I think their point is that there are million ways to go wrong with php and just one to do everything right.

Wikipedia surely picked the good one.
Re:Where are the PHP/MySQL doom criers? by TheRaven64 · 2008-06-24 22:53 · Score: 1

This is exactly the sort of workload MySQL is good for - heavily read biased, simple queries and no data integrity concerns. As for PHP, well, Mediawiki is some horrible code, but it's fairly simple and so doesn't place very heavy demands on the implementation language.

--
I am TheRaven on Soylent News
Re:Where are the PHP/MySQL doom criers? by adri · 2008-06-24 23:42 · Score: 1

Bullshit. Its a great example of PHP, MySQL -and- caching.
Stock PHP and MySQL by themselves would be (a) useless, and (b) unable to keep up with the load. Their architecture doesn't distribute the content all over the world; trying to keep MySQL servers in-sync across the entire planet would be hilarious. Trying to convince PHP in its stock form to generate that much content would require an enormous amount of servers and some form of SQL caching layer because MySQL isn't designed for that - hence why people roll memcached in a lot of situations.
(There are places which run a very, very hacked up PHP to get stupendously high speeds out of it, but it ain't your daddy's PHP..)
Re:Where are the PHP/MySQL doom criers? by trawg · 2008-06-25 20:04 · Score: 1

heh true dat, but the same can be said of just about anything

Re:Off-topic, I know, but...what about /.'s hardwa by Bill,+Shooter+of+Bul · 2008-06-24 16:31 · Score: 1

pretty ironic, when you think about it. A site as incredibly useful as Wikipedia scales nicely, Twitter not so much. I like that kind of irony.

--
Well.. maybe. Or Maybe not. But Definitely not sort of.

Re:Internet archive is low traffic compared to Wik by TheRaven64 · 2008-06-24 22:49 · Score: 1

Citation needed here, I think. While I visit Wikipedia a lot more often than archive.org, I've downloaded a few 4GB films from archive.org, and so the total amount of traffic I've generated to them dwarfs the total wikipedia usage of most people I know (and I know a few other people who have downloaded public domain films from archive.org).

--
I am TheRaven on Soylent News

WEB 2.0, not Net 2.0 by Nicolas+MONNET · 2008-06-25 06:36 · Score: 1

Wikipedia is all user-generated content.
Web 1.0 contains only marginal amounts of user-generated content.

Slashdot doesn't kill servers -- apps and database by patio11 · 2008-06-25 22:11 · Score: 1

If you take a $20 a month VPS (or for that matter a $5 a month GoDaddy shared hosting account) serving static CSS/HTML with a few images, on Apache, then you can take a Slashdotting straight to the head without any issue whatsoever. Apache will put your 200 kb of content into memory, it gets served out as fast as the connections come in, you win.

Then point the same Slashdotting at, e.g., a page which requires a minor .1 second hop to the database to render and BAM Slashdotting. Similarly, if you've got a heavy media object (e.g. videos hosted locally), you'll probably saturate your bandwidth.

Ironically Slashdot is probably more capable of taking out sites these days than it was previously not because servers are slower (they're much, much faster) or server code is worse (its much, much better) but bceause the average complexity of the typical website is growing.

Compare a default Wordpress install (which doesn't cache anything because, hey, who needs to cache operations that inexpensive? Its not like you were expecting to get popular...) to a static HTML page written in notepad, which was the standard I-can't-believe-its-not-blog format in 1996. If you fire a slashdotting at the Wordpress install, PHP will cause your RAM utilization to go to "lots" and you will likely either get killed by your host or see the majority of visitors get timed out. If you fire it at the static HTML page, no worries.

--
Help poke pirates in the eyepatch, arr.

OFFTOPIC! by Anonymous Coward · 2008-06-26 12:22 · Score: 0

hey, are you THE bloodninja? the famous bloodninja from the cybersex logs? if so, AHAH!

Slashdot Mirror

Huge Traffic On Wikipedia's Non-Profit Budget

240 comments