Dvorak on Google and Wikipedia

← Back to Stories (view on slashdot.org)

Dvorak on Google and Wikipedia

Posted by CowboyNeal on Tuesday February 15, 2005 @03:55AM from the looking-for-strings dept.

cryptoluddite writes "PC Magazine has an article by John C. Dvorak expanding on the community discussion of Google's offer for free web hosting of Wikipedia. Those against the deal point out that Google may be planning to co-opt the encyclopedia as Googlepedia (by restricting access to the complete database). In a revealing speech given by the Google founders, Larry Page says he would 'like to see a model where you can buy into the world's content. Let's say you pay $20 per month.' Should public domain information be free?" It's a pretty scary scenario painted, but one can hardly take a speech from 2001 as serious evidence these days. Update: 02/16 20:16 GMT by T : This story links inadvertently to the second page of the column; here's a link to the first page.

12 of 449 comments (clear)

Min score:

Reason:

Sort:

Is this just alarmist talk from a doomsayer? by StateOfTheUnion · 2005-02-15 04:03 · Score: 5, Informative

hose against the deal point out that Google may be planning to co-opt the encyclopedia as Googlepedia (by restricting access to the complete database).
Can they do that? The wikipedia is governed by the GNU Free Documentation License . . .wikipedia details here.
This is why Jimbo didn't want the details to leak by Neophytus · 2005-02-15 04:03 · Score: 5, Informative

Speculation runs rife. I guess security through well... not very obscurity's bound to get someone chatting in the end.

The deal in the short to medium term with wikipedia is expected to be the provision of about a dozen caching servers. No actual database work would be done by google. There is already a small (3) squid cluster in Paris that does this for users in the UK and France saving on some transatlantic bandwidth.
Google Groups is still Usenet... by GillBates0 · 2005-02-15 04:06 · Score: 5, Informative

As I understand it, Google Groups is just one more interface to Usenet, like zillion others offered by ISPs, schools, and other servers. The propogation mechanism of messages is still the same, and they just offered a way for people to access News using a web based interface (lots of other sites offer this) rather than through a regular News reader (rtin, etc).
I'm fine with Google offering a faster mirror/interface to Wikipedia, because mirroring of information is always good. From the last /. article on the subject, I gathered that Google would offer their faster processing power and ub3r bandwidth to Wikipedia....but that doesn't necessarily mean they get to hijack the content....they'd just provide a faster way to get to information that's mirrored elsewhere.

--
An Indian-American Hindu committed to non-violent thought/speech/action alarmed by the global explosion of radical Islam
Dirty tricks 101: quotes out of context by saddino · 2005-02-15 04:11 · Score: 5, Informative

In a revealing speech given by the Google founders, Larry Page says he would 'like to see a model where you can buy into the world's content. Let's say you pay $20 per month.

The only thing "revealing" about that article is that Page continues "Somebody else needs to figure out how to reward all the people who create the things that you use. " In other words, what Page would like to see is a system where "users" pay for accessing content and "contributors" are paid for providing it.

This /. story could have equally read "Does Google Want to Pay Wiki authors?" but of course, that would have derailed cryptoluddite's agenda to smear Google.

To the editors: when you see the words may be planning, just ignore the submission in the future. TIA.
Re:Is this just alarmist talk from a doomsayer? by tdvaughan · 2005-02-15 04:18 · Score: 3, Informative

Since the copyrights are owned by the people who contribute to the articles, Google would have to contact each of them and ask them to relicense their contributions under a less permissive one. It's a bit like when that dude asked if a Linux kernel snapshot could be released under a BSD license for $50,000. Not going to happen.
Re:Licensing? by pohl · 2005-02-15 04:29 · Score: 4, Informative

Not only that, but the open content license also allows Google to profit from providing premium access (read: low-latency) to their own instance of the content. This sort of scenario was anticipated from the beginning when the content license was discussed, and it was considered to be an indicator of success.

--
The "cue the foo posts in 3, 2, 1..." posts will commence with no subsequent foo posts in 3, 2, 1...
From the speech in question by Infonaut · 2005-02-15 04:31 · Score: 3, Informative

"One risk of that is that people don't get paid for their content, which is clearly a problem. I'd personally like to see a model where you can buy into the world's content. Let's say you pay $20 per month and get access to the world. Somebody else needs to figure out how to reward all the people who create the things that you use."
It seems to me that they're talking about copyrighted content here. Rather than concocting a plan to bundle up free content and make people pay Google for access, it looks to me like Page was actually talking about reasonable means of access to copyrighted information.

--
Read the EFF's Fair Use FAQ
This is all fud by Raul654 · 2005-02-15 04:40 · Score: 4, Informative

First, full discloser - I'm a long time wikipedia user and I probably accidentally played a peripheral role in breaking this story. I first heard about the google deal back in July. Google is not the first company to offer to host wikipedia. The typical offer comes from "Mom and Pop ISPs" (Jimbo's words) that really don't have any idea what they're getting themselves into (1,400 hits/sec is a helleva lot to do for free). What I have to say in reply to this story is - it is, IMHO, totally FUD. It's completely hypothetical, and it's unrealistic. You have to remember - all the text on Wikipedia is licensed under the GNU Free Documentation License or in the public domain; all the images and audio are licensed under the GNU Free Documetnation license, or CC-by-SA, or something liberal equivalent. So even if, on the off chance, Google succumbs to the Corporate pressure to be evil, anyone can take the text and reuse it in less evil ways. Furthmore, I trust Jimbo, Angela, and Anthere (the visible members of the board) in dealing with google to make sure the deal is done right by the rest of us contributors. There's a long history on Wikipedia of being against ads of any form - the spanish wikipedia forked several years ago over hypothetical discussion of it.

--

To make laws that man cannot, and will not obey, serves to bring all law into contempt.
--E.C. Stanton
Wrong. by Raul654 · 2005-02-15 04:44 · Score: 5, Informative

You're wrong. The problem wasn't that we didn't have enough servers, but that the servers we had were misconfigured. The slowness experienced in January was resolved when the configuration bugs were ironed out. The problem is a lack of skilled sysadmins and developers. (And for the record, we just put in an order for 10 more servers)

--

To make laws that man cannot, and will not obey, serves to bring all law into contempt.
--E.C. Stanton
Re:Licensing? by Raul654 · 2005-02-15 04:47 · Score: 4, Informative

They cannot. This article is nonsensical FUD from someone who doesn't know what he is talking about. (--A wikipedia admin)

--

To make laws that man cannot, and will not obey, serves to bring all law into contempt.
--E.C. Stanton
paranoid and poorly researched by maveric149 · 2005-02-15 07:12 · Score: 3, Informative

First any offer of hosting by Google or anybody else for that matter will not make the 40 or so servers that the Wikimedia Founation already owns go away or stop the foundation from paying its own hosting costs for those servers. Nor will it stop donations from coming in so the foundation can buy more hardware and bandwidth. And the foundation is *not* going to just rely on any one hosting partner but will instead seek out and act upon multiple offers (this is in fact necessary due to the exponential growth of traffic to the sites it operates; such as Wikipedia.org).

The most glaring omission Dvorak makes is the simple fact that due to the license Wikipedia uses, that it would be impossible for any one company to control it. If the 'end' were really near, somebody with better intentions could just download the *whole* Wikipedia and host it. But it would never come to that because the foundation would not allow it ; its very mission is to ensure free access to the projects it runs.

I'm very disappointed in Dvorak.
What really happened that week by Jamesday · 2005-02-15 18:55 · Score: 3, Informative
You're both a bit right. Here's a highlight view of some of the things happening during that week:
- New squid cache servers in Paris. After network bandwidth issues there were resolved they speeded up access in bits of Europe. But they also slowed down all page saves because saves also tell the Squids to remove (flush) related pages from their cache and those more distant servers took longer to flush. Can be tens of thousands of flushes to do. Squid flushing/purging is now much faster and longer term it's being modified to be taken completely out of the save loop. This one really hurt save speed for a while, as the developers sorted out what was happening and improved the way purging was done. Net result: all purging of squids is much more efficient and saves remain faster than they used to be. The way pages are delivered to those who aren't logged in was also improved, so style sheets are served from the Squids for them now - that makes page views for those not logged in less sensitive to Apache web server load.
- Load balancing pain. Load balancing is what chooses which Apache web server gets the next request. The previous system wasn't very even so many requests were getting sent to the most heavily loaded Apaches when they should have been sent to a less loaded one instead. It's been a long-standing problem for us. During the week or so you're talking about we were testing several different replacement load balancing systems to find those which would give a good result:
  - Pen seemed better than running modified Squids on the Apaches but wasn't good enough.
  - Perlbal from the Livejournal people worked very well and gave a nicely even load balance. Brad and Mark from LJ were very helpful in getting it described and set up. We took two Apache web servers to use for this in case they used too much CPU. As it turned out they used only about 10% on each machine. But that left us two Apaches not building pages...
  - Our Squid expert wrote a replacement ICP client to run on the Apaches. That also produced an even load balance so around the end of the week we switched to using it. Freed up those two Apaches to go back to page building duty. So, until we need its other features, Perlbal isn't in use - not quite the best solution for us today (but may be in the future).
  So, we left that week with much improved load balancing for the Apaches. Much more consistent page load times now.
- Bugs in MediaWiki 1.4 beta left the Apaches filling their available child slots. That combined with some specific web crawlers could sometimes let the crawlers take the site down by leaving no or very few free children to handle requests. Also increased Apache load a bit. The most important ones have been fixed but there's still an occasional stuck child. To deal with that we have a script restarting one Apache web server every 5 or so minutes, to ensure that it can't rise to a troublesome level.
- The crawler/Apache child problem and a too high setting for maximum children would let the Apache server on some Memcached machines take so much RAM that the system swapped. Very bad news because that caused Memcached to respond very slowly - far too slowly, so all Apaches filled all child slots and the site appeared dead. That's been dealt with now. Not a Memcached issue as such - just the usual don't let the box swap to death situation. While tracking this down assorted other Memcached-related things were improved.
- As a temporary workaround, the two Memcached machines we were using at that time had Apache stopped on them. That left us four Apaches short for a few days. That took us beyond the critical Apache CPU shortage point and Apache load and wait times rose significantly. Better than a dead site but not at all good. Response time drop with loss of machines isn't linear beyond a certain point and four more Apaches out of service took us beyond that point.
- Also during that week the handling of database updates was substantially improved, so lock waits are mostly gone, wh