Scroogle is back. Thanks to the help from three Scroogle users, I learned that there is a way to access that same simple interface with an extra parameter in the URL by using www.google.com/search (that param is &output=ie), instead of through the former static page www.google.com/ie without the extra parameter. It appears that both methods amount to the same thing.
I apologize for the title, "Scroogle has been blocked." It was in an old template, afterwhich the program went on to read a current text file. In the future it will read, "Scroogle is having problems with Google." We were IP blocked by Google more than once a couple years ago, but not all of our servers were blocked at the same time and we rerouted traffic, so no one noticed. We got those blocks lifted by Google within a few days.
-- Daniel Brandt, Scroogle programmer and sysadmin; president of nonprofit public charity Public Information Resarch, Inc., owner of Scroogle.org
A site that started last week, AVG Watch, is collecting the IP addresses of LinkSpanner users that visit two other sites they have. After three days, they have 21,000 addresses.
Strangely enough, HyperCard didn't crush the corporations like Kevin Kelly promised it would:
"HyperCard is uniquely suited for activist causes. It goes without saying that its great ease of use and flexibility favors the underdog. Activist groups have often relied on people power and maneuverability to counteract the brute economic and political force of various Powers-That-Be; HyperCard can enhance both of these advantages."
This quotation is from page 164 of "Signal: Communication Tools for the Information Age," Kevin Kelly, editor. Foreward by Stewart Brand. A Whole Earth Catalog. Point Foundation, 1988.
I think the Pakistan situation may be related to what I'm seeing with my google-watch.org traffic. I've been running this site since 2002. The traffic to the home page is usually around 1000 a day. The home page contains anti-Google and anti-Wikipedia cartoons on a rotating basis.
Yesterday I noticed a surge in home page traffic: 2008-02-14 1152 2008-02-15 1062 2008-02-16 828 2008-02-17 949 2008-02-18 1053 2008-02-19 1179 2008-02-20 987 2008-02-21 1103 2008-02-22 1031 2008-02-23 2274 2008-02-24 6873
This traffic comes in without a referrer and almost none of it goes deeper into the site. The IP addresses are from all over. I have not been able to find any evidence of significant news coverage in the last few days that mentions Google-Watch.org without linking to it. (If it was mentioned without a link, folks pasting it into their address bar would show up in my logs without a referrer.)
Could it be related to the Pakistani routing situation? The redirect might be happening in some small corner of someone's router access -- someone who is sympathetic with those objecting to depictions of the Prophet Muhammad. Is this tit for tat, trading Muhammad depictions for anti-Google and anti-Wikipedia cartoons?
I added a second static IP address to google-watch.org yesterday, for round-robin load sharing. The load immediately split between the two servers, as expected. My conclusion is that the redirect requires a DNS lookup by the end-user. If it was a static IP redirect, this unusual traffic would not have found the second server.
Zoeller was libeled, and it appears that it was done by an employee at work. The company doesn't deserve to take the rap for this any more than Zoeller should have to put with it. The vandal should be identified for the record. Wikipedia has hidden the evidence, but some of it was captured before they did this. It is linked at the top of http://www.wikipedia-watch.org/
Actually, Essjay is high-enough up in the administration so that he can alter his history and no one dares criticize him for it. He's been shoving the earlier stuff down the memory hole.
One of the top administrators at Wikipedia goes by the name of Essjay. In an article by Stacy Schiff, a Pulitzer Prize-winning writer, Essjay is described as follows in the July 31, 2006 issue of The New Yorker magazine:
"One regular on the site is a user known as Essjay, who holds a Ph.D. in theology and a degree in canon law and has written or contributed to sixteen thousand entries. A tenured professor of religion at a private university, Essjay made his first edit in February, 2005.... Essjay is serving a second term as chair of the mediation committee. He is also an admin, a bureaucrat, and a checkuser, which means that he is one of fourteen Wikipedians authorized to trace I.P. addresses in cases of suspected abuse. He often takes his laptop to class, so that he can be available to Wikipedians while giving a quiz, and he keeps an eye on twenty I.R.C. chat channels, where users often trade gossip about abuses they have witnessed."
The information in The New Yorker came from his user page that he developed over the previous year. He pushed all the correct Wikipedia buttons: he said he was gay, an expert on Catholocism but an elder in a liberal Protestant church, he and his partner had both a cat and a dog, and he was past 30 but not yet 40. From credentials like this, and from his mind-boggling level of activity on Wikipedia, he became administrator, bureaucrat, checkuser, oversight, and last month was named a community manager at Wikia.
Perhaps because he is employed by Wikia now, Essjay has coughed up his real name. He doesn't have two PhDs, and he isn't a tenured professor. He's a 24-year-old living near Louisville, Kentucky. The New Yorker, famous for its fact-checking, got it all wrong.
Incidents like this illustrate the limitations of the Wikipedia approach. It's not an encyclopedia, but rather it's a video game that escaped from its box, and is now influencing real people in the real world.
For 14 months I've tried to get my bio taken down because I'm not notable. They just laugh at me, and by now there are six long "Talk" pages associated with my bio that are full of insults. It all gets indexed in Google. I'm so non-notable that they cannot find a picture of me anywhere on the web. It doesn't make any difference. When the teen-age admins on Wikipedia decide that someone needs to be punished for challenging their right to be anonymously obnoxious and invade my privacy, nothing stands in their way. There are 142,766 biographies of living people in the English edition of Wikipedia, and I promose you that this figure includes a lot of people who would rather not have to watch their biographies get vandalized for the rest of their lives. http://www.wikipedia-watch.org/
Some have expressed an interest in reading what Wikipedia called a "good article," the NPA personality theory article. Here it is, grabbed by Wikipedia-Watch from Google's cache, and stripped of junk added by Google: http://www.wikipedia-watch.org/npa.html
Why is this news? Maybe because the Associated Press says it's news, and it's in hundreds of newspapers?
Why should Slashdotters care? Because while AP doesn't use links, Slashdot should have the courtesy of linking to the original sources that AP used to generate the report. (Plus AP also checked with Jimmy Wales for a reply, which is expected from professional reporters.)
Perhaps I'm mistaken, but you seem to be suggesting that you have to be logged into Gmail while browsing their search engine at the same time. This is not true, unless you are in the habit of deleting your cookies constantly. Google uses the same main cookie with a unique ID in it across all of their *.google.com services. You don't have to be logged into Gmail. They know who you are even when you are logged out of Gmail.
Wikipedia has no business creating biographies of living persons without their permission. A case might be made for doing this to a public person such as Kenneth Lay, but when it comes to someone like me, who is not notable, it can be distressing.
Last October I started http://www.wikipedia-watch.org/ because an anonymous administrator at Wikipedia, SlimVirgin, started a biographical article on me without asking, and with a preconceived notion that I was someone that needed to be put in his place because I had political views that were incompatible with hers.
There are more anonymous teenagers who are administrators at Wikipedia than you can believe. The subject of a biography is not allowed to contribute to his own article, and if he tries, they ban him like they banned me, and they also start getting downright nasty.
Wikipedia is okay on pop culture, on high-tech and scientific topics, and also on trivia that no one cares about. But when it comes to social and political issues, and biographies of living persons, what they need is a big fat libel and invasion-of-privacy lawsuit to stop them in their tracks.
It's not an encyclopedia. Instead, it's more like a video game that escaped from its box. It can be devastating to the victim, because all the search engines rank Wikipedia near the top. Human Relations departments are increasingly googling applicants before they call them for interviews.
The entire situation of some anonymous teenager having the power to ruin someone's life because it's fun and interesting, is something that has to stop. I don't mind a newspaper reporter writing about me because reporters and newspapers are identifiable and accountable. More importantly, when they write about me it's generally within a very narrow context, and several days later the same newspaper is used to wrap fish. The short shelf life makes a newspaper article, even if it is negative, a non-issue for me.
Wikipedia, by comparison, stays at the top of the engines forever. I have to watch my biography every day, because I never know who might have vandalized or distorted it last night. John Seigenthaler's biography is still getting vandalized, and it's taking three hours before some of them are reverted. His article is one of the most-watched by the vandal patrol of all Wikipedia articles. What about someone like me? If I don't watch it, who will?
I want them to take my biography down, period. I've been trying for nine months. They just laugh at me.
Mozilla Foundation should pay their taxes
on
A History of Firefox
·
· Score: 0, Offtopic
The Form 990 for 2004 has been released by Mozilla Foundation. They consider their $4.4 million in income from "search revenues" (apparently this is all from Google) to be part of their exempt function or purpose.
The IRS considers any advertising income realized by a nonprofit to be "unrelated business income" and subject to taxes. Question: What's in the contract with Google that Mozilla signed in 2004? Is it based on AdWords percentages? Opera's 2005 contract with Google works this way, so I assume that Mozilla's contract with Google does also.
Even if unrelated to AdWords, the fact remains that doing any sort of business with Google, which is an ad agency (99 percent of Google's income is from advertising), means that income from Google is unrelated to any nonprofit, exempt function or purpose.
And it's definitely not a donation. A charitable contribution requires that no goods or services are exchanged. The money passed from Google to Mozilla does not qualify as a contribution, because Google has received substantial benefit from the association.
Naughty Mozilla Foundation. They should pay their taxes.
Google Watch also runs Scroogle.org, a proxy that scrapes Google and/or Yahoo. One reply to the post says that I'm still a nut. But while I may or may not be a nut, this reply from an Anonymous Coward is wrong about the cookie. You don't need a globally-unique ID in a cookie to save the user's preferences. That is NOT the primary purpose of the cookie, but rather a convenient cover story for Google. The purpose of the cookie is so that you have a unique ID to tie together the activity of a single person who uses different IP addresses over time.
In fact, you don't even need a cookie to save preferences. All you need is a specially-crafted URL that you save as a bookmark.
Assuming that you delete your cookie constantly, or use a browser that lets you define your search engine cookie as a session cookie despite the expiration date, then the question becomes, "How do I change my IP address, which tends to be a bit too sticky for my tastes now that I'm on broadband?"
Broadband providers have different policies in different parts of the world, or even different parts of the U.S. But as someone who recently has been a Timer Warner Cable broadband subscriber, and switched to SBC/Yahoo DSL broadband, it seems to me that the key to getting a fresh IP address -- at least in San Antonio, Texas where I'm located, is to show a different network interface card MAC address to your provider.
I have two computers, and when I switch my Ethernet connection to the other computer, both the cable provider and the DSL provider tended to give me a new IP address. You have to power down the the modem and the computer while the switch is made, or else one or both might remember the old IP address and cause it to get reassigned. Before powering down your computer, clear your old IP address in that window that shows your network connection, so thatn when it powers back up it looks for a new address instead of telling the modem what address it used to have.
Yes, your service provider probably has a list of all the IP addresses you ever used, and when you used them. But it's one step more complex for the bad guys to pull together a list like this from your service provider. Without this extra step, the information from Google won't be complete.
Of course, you can use Scroogle.org for your searches and not even worry about this stuff.
There is a chronology of how it was traced at the bottom of this page.
I am no genius. There was one chance in 10,000 that there would be a server on that IP address, and that it would be up when I tried it on impulse (it timed out during nightime hours during all of last week).
Mr. Seigenthaler is very gracious in complimenting me, but I am no genius. Anyone who knows the difference between an IP address and a hot-dog with mustard could have done the same thing. That includes dozens, or maybe hundreds, of Wikipedians. But they didn't bother now, did they?
It was a pleasure to work with Mr. Seigenthaler on this trace. He is an amazing, accomplished person, and I have a huge amount of respect for him. Before his Wikipedia story came out, I wasn't aware of him.
He's the genius, although it is true that I know more about Internet infrastructure than he does. But I know nothing that would impress all the clever Slashdotters reading this, I'm sure.
I've been thinking more lately about the president of the American Library Association, Michael Gorman, and the objections he has to the Google Book Search. He's almost the only person I've head of who objects not on the basis of copyright, but rather on the basis of the atomization of information.
Then I did a search on the name of one of people behind Google-Watch, and compared Google's snippet containing his name to the actual text from the book. Atomization? Heck, he got completely nuked by the snippet. I fear for the future of education.
Wikipedia, the most-scraped site on the planet, indirectly generates massive numbers of ads. It is inevitable that Jimmy Wales, already a rich man, will want to get richer.
Jimmy Wales runs Wikipedia from the profits that come from Bomis and from donations. Bomis.com is a porn directory network with an innocent-looking front end, and a huge number of ads and paid links.
Wikipedia is straining under the load from a massive increase in traffic. This is due to the buzz from the media, as well as impressive rankings in Yahoo and Google.
Most of the insider administrators are anonymous, and they can use their editing privileges to stomp on any initiatives from the unwashed masses that they find objectionable. The word "cult" comes to mind. Recently there is a move on to require footnote citations for most assertions, in order to make the articles appear neutral. However, in my experience last week with Jimmy and one of his top anonymous admins, SlimVirgin, it seems to me that if the citation itself looks like an opposing opinion, then that's good enough. No one over there actually reads the stuff they cite -- no time for that.
The only defense the unannointed have is to put together their own list of CGI proxies, and give them a hard time for a couple of days. But the admins have many more "rollback" weapons to make it easy to "revert" any changes, which makes this too much trouble for any single unprivileged person.
I predict that before Wikipedia breaks under the traffic load, Jimmy will start running AdSense or Yahoo ads. At that point a lot of editors will probably leave, since their work is volunteer and they might now see Wikipedia as something quite different. Look at what the Google tie-in did for Mozilla Foundation, for example. Potentially millions per year would be generated by ads on Wikipedia.
Then he'll bank most of the money, buy some more bandwidth to keep it going as long as he can, but ultimately let it run down. I don't for a minute believe that Jimmy is motivated by this:
"Imagine a world in which every single person on the planet is given free access to the sum of all human knowledge. That's what we're doing." -Jimmy Wales, July 2004
Google, and yes, Yahoo and MSN, need to answer at least three crucial questions:
1) Google admits that your search terms are saved along with your unique cookie ID and your IP address, and a time/date stamp. However, they spin this by suggesting that it's merely part of the normal logging process. The question is this: To what extent does Google parse out and database this information for future reference and easy access?
2) Does Google have any data retention policies for various types of data, or do they keep it all forever?
3) On a country-by-country basis, how many requests does Google get from government officials for user information? We don't need names, but we need numbers, so that we can judge the comparative risks of using Google or other search engines for each country. China, for example, is riskier than the U.S., but even in the U.S. it's illegal for Google to disclose a request if it's a national security matter. However, they could reveal statistics if they wanted to.
I wrote to administrators, and to each Regent at the University of Michigan, trying to get them to look at Section 108 of the Copyright Act, which specifies the limits of library copying as an instance of fair use. I believe that Section 108 prohibits U.Michigan from what they are allowing Google to do with their books.
I also wrote to the Authors Guild, expressing my frustration that U.Michigan ignored me. A copy of that letter is here.
Google is a weed growing in the copyright garden. I was thinking that Section 108 might be used to trim back the weed. The Authors Guild is wisely hoping to pull it out by the roots. If that doesn't work, maybe they can trim it back later.
I'm quite happy with The Author's Guild suit. It looks to me like they know what they're doing. Maybe five years down the road, when the Supremes establish that opt-in also applies to websites, then we'll be able to force robots.txt into an opt-in mode instead of the present opt-out mode. That will fragment the monopoly of the big search engines, and help to give the web back to the webmasters.
If Google could show snippets from books without first copying the entire book, and if they did this without any commercial interest or intent, then I think they might have a fair-use argument. But there are some hurdles before they can get to that argument.
Many Google acolytes like to point out that Google already grabs much of the web in its entirety, which is copyrighted by default. That's true. But that doesn't mean it's legal. All it means is that search engines started doing this before webmasters got organized into associations (they still aren't organized), and there was no one to challenge the engines.
Now if webmasters had been around as long as authors, and were organized to protect their interests, the engines would have never gotten this far with their illegal crawling for profit.
For me, the issue is that Google, a rich corporation, has talked some libraries into providing access to their collections, even though the library is not the rights holder for the copyrighted works they own. The library that is most eager to let Google scan everything is the University of Michigan, a public institution.
The contract with U.Michigan was confidential until they posted it in response to a request I filed under Michigan's freedom of information law. Google gets to scan everything, and U.Michigan gets a copy of the scanned files. However, U.Michigan is not able to do anything with their copies except to offer it on their own website, assuming that they take measures to prevent excessive downloading and automated crawling.
By way of contrast, Google gets to do anything it wants with its copies, forever, and that includes selling it to partners, or passing them along to any successor of Google. They will show ads for where to buy copies of out-of-print books. The entire book will be scanned, but only snippets will be shown surrounding the search term for books that are in copyright. With this latest announcement, they say that they will not show sponsored links unless the publisher agrees to join in the Google Print program.
Google considers anything published after 1922 to be copyrighted, except for government documents that had no copyright to begin with. Now they are inviting publishers to opt-in to their Print program, so that more than snippets can be displayed, and the publisher can get a cut of the sponsored links that are clicked on.
But you have to ask yourself, how many books that were published since 1922 are represented by current publishers who are aware of Google's plans and inclined to respond to Google's invitation to opt-in or opt-out? Consider that many publishers are no longer the rights holder once a book goes out of print, as contracts often stipulate that the copyright then reverts to the author. When Google talks about allowing publishers to opt-in to the Print program, or opt-out of the scanning, my guess is that we're talking about less than 20 percent of all copyrighted material that Google plans to grab.
The other 80 percent will be grabbed by Google without the "express consent" of the rights holder that is required by copyright law, usually with the rights holder not even being aware that an opt-out is available from Google. This is what Google has its eyes on, but it's not what they want you to think about when considering this issue. The used-book purchase links alone will be a cash cow for this 80 percent. Their statement that they will not show sponsored links on pages from copyrighted books that have not opted-in is not enforceable, given that they can chang their mind about that further down the road. It's just not fair to rights holders.
The proper procedure would be for Google to solicit permission for anything in copyright, and skip that book if there is no response. They should make an arrangement with some entity similar to the Copyright Clearance Center, and invite rights holders to submit permission forms for Google to scan their books. A license fee might be involved, so that these holders can get some compensation. The question of whether ads are allowed, or how much content can be displayed, could be negotiated as part of the license fee. Then if the library has the book, no one will complain when Google scans it. If it doesn't have the book, perhaps the rights holder can make a copy available if Google still wants it.
That's what Google should be doing, instead of ripping off every rights holder since 1922 by default. There is more on this issue at Google Watch.
Scroogle is back. Thanks to the help from three Scroogle users, I learned that there is a way to access that same simple interface with an extra parameter in the URL by using www.google.com/search (that param is &output=ie), instead of through the former static page www.google.com/ie without the extra parameter. It appears that both methods amount to the same thing.
I apologize for the title, "Scroogle has been blocked." It was in an old template, afterwhich the program went on to read a current text file. In the future it will read, "Scroogle is having problems with Google." We were IP blocked by Google more than once a couple years ago, but not all of our servers were blocked at the same time and we rerouted traffic, so no one noticed. We got those blocks lifted by Google within a few days.
-- Daniel Brandt, Scroogle programmer and sysadmin; president of nonprofit public charity Public Information Resarch, Inc., owner of Scroogle.org
A site that started last week, AVG Watch, is collecting the IP addresses of LinkSpanner users that visit two other sites they have. After three days, they have 21,000 addresses.
Strangely enough, HyperCard didn't crush the corporations like Kevin Kelly promised it would:
"HyperCard is uniquely suited for activist causes. It goes without saying that its great ease of use and flexibility favors the underdog. Activist groups have often relied on people power and maneuverability to counteract the brute economic and political force of various Powers-That-Be; HyperCard can enhance both of these advantages."
This quotation is from page 164 of "Signal: Communication Tools for the Information Age," Kevin Kelly, editor. Foreward by Stewart Brand. A Whole Earth Catalog. Point Foundation, 1988.
I think the Pakistan situation may be related to what I'm seeing with my google-watch.org traffic. I've been running this site since 2002. The traffic to the home page is usually around 1000 a day. The home page contains anti-Google and anti-Wikipedia cartoons on a rotating basis.
Yesterday I noticed a surge in home page traffic:
2008-02-14 1152
2008-02-15 1062
2008-02-16 828
2008-02-17 949
2008-02-18 1053
2008-02-19 1179
2008-02-20 987
2008-02-21 1103
2008-02-22 1031
2008-02-23 2274
2008-02-24 6873
This traffic comes in without a referrer and almost none of it goes deeper into the site. The IP addresses are from all over. I have not been able to find any evidence of significant news coverage in the last few days that mentions Google-Watch.org without linking to it. (If it was mentioned without a link, folks pasting it into their address bar would show up in my logs without a referrer.)
Could it be related to the Pakistani routing situation? The redirect might be happening in some small corner of someone's router access -- someone who is sympathetic with those objecting to depictions of the Prophet Muhammad. Is this tit for tat, trading Muhammad depictions for anti-Google and anti-Wikipedia cartoons?
I added a second static IP address to google-watch.org yesterday, for round-robin load sharing. The load immediately split between the two servers, as expected. My conclusion is that the redirect requires a DNS lookup by the end-user. If it was a static IP redirect, this unusual traffic would not have found the second server.
Some screen-shot links for those who want more information. (Wikipedia sometimes makes controversial pages disappear):
Essjay's user page at Wikia, where he "outed" himself:
http://www.wikipedia-watch.org/gifs/wmessjay.png
Previous details from an old user page at Wikipedia:
http://www.wikipedia-watch.org/gifs/essjay5.png
Essjay brags about how he fooled The New Yorker:
http://www.wikipedia-watch.org/essjay.html
Zoeller was libeled, and it appears that it was done by an employee at work. The company doesn't deserve to take the rap for this any more than Zoeller should have to put with it. The vandal should be identified for the record. Wikipedia has hidden the evidence, but some of it was captured before they did this. It is linked at the top of http://www.wikipedia-watch.org/
Actually, Essjay is high-enough up in the administration so that he can alter his history and no one dares criticize him for it. He's been shoving the earlier stuff down the memory hole.
But there's a screen shot of some of his previous user information that was captured last month before he took it down. It's at http://www.wikipedia-watch.org/gifs/mtessjay.png
No, read The New Yorker for yourself, and then read Essjay's new personal information at the bottom of http://en.wikipedia.org/wiki/User:Essjay
One of the top administrators at Wikipedia goes by the name of Essjay. In an article by Stacy Schiff, a Pulitzer Prize-winning writer, Essjay is described as follows in the July 31, 2006 issue of The New Yorker magazine:
"One regular on the site is a user known as Essjay, who holds a Ph.D. in theology and a degree in canon law and has written or contributed to sixteen thousand entries. A tenured professor of religion at a private university, Essjay made his first edit in February, 2005.... Essjay is serving a second term as chair of the mediation committee. He is also an admin, a bureaucrat, and a checkuser, which means that he is one of fourteen Wikipedians authorized to trace I.P. addresses in cases of suspected abuse. He often takes his laptop to class, so that he can be available to Wikipedians while giving a quiz, and he keeps an eye on twenty I.R.C. chat channels, where users often trade gossip about abuses they have witnessed."
The information in The New Yorker came from his user page that he developed over the previous year. He pushed all the correct Wikipedia buttons: he said he was gay, an expert on Catholocism but an elder in a liberal Protestant church, he and his partner had both a cat and a dog, and he was past 30 but not yet 40. From credentials like this, and from his mind-boggling level of activity on Wikipedia, he became administrator, bureaucrat, checkuser, oversight, and last month was named a community manager at Wikia.
Perhaps because he is employed by Wikia now, Essjay has coughed up his real name. He doesn't have two PhDs, and he isn't a tenured professor. He's a 24-year-old living near Louisville, Kentucky. The New Yorker, famous for its fact-checking, got it all wrong.
Incidents like this illustrate the limitations of the Wikipedia approach. It's not an encyclopedia, but rather it's a video game that escaped from its box, and is now influencing real people in the real world.
For 14 months I've tried to get my bio taken down because I'm not notable. They just laugh at me, and by now there are six long "Talk" pages associated with my bio that are full of insults. It all gets indexed in Google. I'm so non-notable that they cannot find a picture of me anywhere on the web. It doesn't make any difference. When the teen-age admins on Wikipedia decide that someone needs to be punished for challenging their right to be anonymously obnoxious and invade my privacy, nothing stands in their way. There are 142,766 biographies of living people in the English edition of Wikipedia, and I promose you that this figure includes a lot of people who would rather not have to watch their biographies get vandalized for the rest of their lives. http://www.wikipedia-watch.org/
Some have expressed an interest in reading what Wikipedia called a "good article," the NPA personality theory article. Here it is, grabbed by Wikipedia-Watch from Google's cache, and stripped of junk added by Google: http://www.wikipedia-watch.org/npa.html
Why is this news? Maybe because the Associated Press says it's news, and it's in hundreds of newspapers?
S ignpost/2006-10-30/Plagiarism_cleanup
Why should Slashdotters care? Because while AP doesn't use links, Slashdot should have the courtesy of linking to the original sources that AP used to generate the report. (Plus AP also checked with Jimmy Wales for a reply, which is expected from professional reporters.)
The report is at http://www.wikipedia-watch.org/psamples.html
Wikipedia's own newsletter reports on it here:
http://en.wikipedia.org/wiki/Wikipedia:Wikipedia_
The efforts of Wikipedia administrators to clean up the mess are chronicled here: http://en.wikipedia.org/wiki/User:W.marsh/list
Of course, Slashdotters may continue shooting from the hip if they choose. It's what they do best.
Perhaps I'm mistaken, but you seem to be suggesting that you have to be logged into Gmail while browsing their search engine at the same time. This is not true, unless you are in the habit of deleting your cookies constantly. Google uses the same main cookie with a unique ID in it across all of their *.google.com services. You don't have to be logged into Gmail. They know who you are even when you are logged out of Gmail.
Wikipedia has no business creating biographies of living persons without their permission. A case might be made for doing this to a public person such as Kenneth Lay, but when it comes to someone like me, who is not notable, it can be distressing.
Last October I started http://www.wikipedia-watch.org/ because an anonymous administrator at Wikipedia, SlimVirgin, started a biographical article on me without asking, and with a preconceived notion that I was someone that needed to be put in his place because I had political views that were incompatible with hers.
There are more anonymous teenagers who are administrators at Wikipedia than you can believe. The subject of a biography is not allowed to contribute to his own article, and if he tries, they ban him like they banned me, and they also start getting downright nasty.
Wikipedia is okay on pop culture, on high-tech and scientific topics, and also on trivia that no one cares about. But when it comes to social and political issues, and biographies of living persons, what they need is a big fat libel and invasion-of-privacy lawsuit to stop them in their tracks.
It's not an encyclopedia. Instead, it's more like a video game that escaped from its box. It can be devastating to the victim, because all the search engines rank Wikipedia near the top. Human Relations departments are increasingly googling applicants before they call them for interviews.
The entire situation of some anonymous teenager having the power to ruin someone's life because it's fun and interesting, is something that has to stop. I don't mind a newspaper reporter writing about me because reporters and newspapers are identifiable and accountable. More importantly, when they write about me it's generally within a very narrow context, and several days later the same newspaper is used to wrap fish. The short shelf life makes a newspaper article, even if it is negative, a non-issue for me.
Wikipedia, by comparison, stays at the top of the engines forever. I have to watch my biography every day, because I never know who might have vandalized or distorted it last night. John Seigenthaler's biography is still getting vandalized, and it's taking three hours before some of them are reverted. His article is one of the most-watched by the vandal patrol of all Wikipedia articles. What about someone like me? If I don't watch it, who will?
I want them to take my biography down, period. I've been trying for nine months. They just laugh at me.
The Form 990 for 2004 has been released by Mozilla Foundation. They consider their $4.4 million in income from "search revenues" (apparently this is all from Google) to be part of their exempt function or purpose.
The IRS considers any advertising income realized by a nonprofit to be "unrelated business income" and subject to taxes. Question: What's in the contract with Google that Mozilla signed in 2004? Is it based on AdWords percentages? Opera's 2005 contract with Google works this way, so I assume that Mozilla's contract with Google does also.
Even if unrelated to AdWords, the fact remains that doing any sort of business with Google, which is an ad agency (99 percent of Google's income is from advertising), means that income from Google is unrelated to any nonprofit, exempt function or purpose.
And it's definitely not a donation. A charitable contribution requires that no goods or services are exchanged. The money passed from Google to Mozilla does not qualify as a contribution, because Google has received substantial benefit from the association.
Naughty Mozilla Foundation. They should pay their taxes.
A copy of Mozilla's 2004 Form 990 is available at http://www.scroogle.org/mozilla.html
Google Watch also runs Scroogle.org, a proxy that scrapes Google and/or Yahoo. One reply to the post says that I'm still a nut. But while I may or may not be a nut, this reply from an Anonymous Coward is wrong about the cookie. You don't need a globally-unique ID in a cookie to save the user's preferences. That is NOT the primary purpose of the cookie, but rather a convenient cover story for Google. The purpose of the cookie is so that you have a unique ID to tie together the activity of a single person who uses different IP addresses over time.
In fact, you don't even need a cookie to save preferences. All you need is a specially-crafted URL that you save as a bookmark.
Assuming that you delete your cookie constantly, or use a browser that lets you define your search engine cookie as a session cookie despite the expiration date, then the question becomes, "How do I change my IP address, which tends to be a bit too sticky for my tastes now that I'm on broadband?"
Broadband providers have different policies in different parts of the world, or even different parts of the U.S. But as someone who recently has been a Timer Warner Cable broadband subscriber, and switched to SBC/Yahoo DSL broadband, it seems to me that the key to getting a fresh IP address -- at least in San Antonio, Texas where I'm located, is to show a different network interface card MAC address to your provider.
I have two computers, and when I switch my Ethernet connection to the other computer, both the cable provider and the DSL provider tended to give me a new IP address. You have to power down the the modem and the computer while the switch is made, or else one or both might remember the old IP address and cause it to get reassigned. Before powering down your computer, clear your old IP address in that window that shows your network connection, so thatn when it powers back up it looks for a new address instead of telling the modem what address it used to have.
Yes, your service provider probably has a list of all the IP addresses you ever used, and when you used them. But it's one step more complex for the bad guys to pull together a list like this from your service provider. Without this extra step, the information from Google won't be complete.
Of course, you can use Scroogle.org for your searches and not even worry about this stuff.
There is a chronology of how it was traced at the bottom of this page.
I am no genius. There was one chance in 10,000 that there would be a server on that IP address, and that it would be up when I tried it on impulse (it timed out during nightime hours during all of last week).
Mr. Seigenthaler is very gracious in complimenting me, but I am no genius. Anyone who knows the difference between an IP address and a hot-dog with mustard could have done the same thing. That includes dozens, or maybe hundreds, of Wikipedians. But they didn't bother now, did they?
It was a pleasure to work with Mr. Seigenthaler on this trace. He is an amazing, accomplished person, and I have a huge amount of respect for him. Before his Wikipedia story came out, I wasn't aware of him.
He's the genius, although it is true that I know more about Internet infrastructure than he does. But I know nothing that would impress all the clever Slashdotters reading this, I'm sure.
I've been thinking more lately about the president of the American Library Association, Michael Gorman, and the objections he has to the Google Book Search. He's almost the only person I've head of who objects not on the basis of copyright, but rather on the basis of the atomization of information. Then I did a search on the name of one of people behind Google-Watch, and compared Google's snippet containing his name to the actual text from the book. Atomization? Heck, he got completely nuked by the snippet. I fear for the future of education.
Google is already working a design for tripods.
Wikipedia, the most-scraped site on the planet, indirectly generates massive numbers of ads. It is inevitable that Jimmy Wales, already a rich man, will want to get richer.
It's primarily Google's fault, according to http://www.wikipedia-watch.org/
Jimmy Wales runs Wikipedia from the profits that come from Bomis and from donations. Bomis.com is a porn directory network with an innocent-looking front end, and a huge number of ads and paid links.
Wikipedia is straining under the load from a massive increase in traffic. This is due to the buzz from the media, as well as impressive rankings in Yahoo and Google.
Most of the insider administrators are anonymous, and they can use their editing privileges to stomp on any initiatives from the unwashed masses that they find objectionable. The word "cult" comes to mind. Recently there is a move on to require footnote citations for most assertions, in order to make the articles appear neutral. However, in my experience last week with Jimmy and one of his top anonymous admins, SlimVirgin, it seems to me that if the citation itself looks like an opposing opinion, then that's good enough. No one over there actually reads the stuff they cite -- no time for that.
The only defense the unannointed have is to put together their own list of CGI proxies, and give them a hard time for a couple of days. But the admins have many more "rollback" weapons to make it easy to "revert" any changes, which makes this too much trouble for any single unprivileged person.
I predict that before Wikipedia breaks under the traffic load, Jimmy will start running AdSense or Yahoo ads. At that point a lot of editors will probably leave, since their work is volunteer and they might now see Wikipedia as something quite different. Look at what the Google tie-in did for Mozilla Foundation, for example. Potentially millions per year would be generated by ads on Wikipedia.
Then he'll bank most of the money, buy some more bandwidth to keep it going as long as he can, but ultimately let it run down. I don't for a minute believe that Jimmy is motivated by this:
"Imagine a world in which every single person on the planet is given free access to the sum of all human knowledge. That's what we're doing." -Jimmy Wales, July 2004
Google, and yes, Yahoo and MSN, need to answer at least three crucial questions:
1) Google admits that your search terms are saved along with your unique cookie ID and your IP address, and a time/date stamp. However, they spin this by suggesting that it's merely part of the normal logging process. The question is this: To what extent does Google parse out and database this information for future reference and easy access?
2) Does Google have any data retention policies for various types of data, or do they keep it all forever?
3) On a country-by-country basis, how many requests does Google get from government officials for user information? We don't need names, but we need numbers, so that we can judge the comparative risks of using Google or other search engines for each country. China, for example, is riskier than the U.S., but even in the U.S. it's illegal for Google to disclose a request if it's a national security matter. However, they could reveal statistics if they wanted to.
Put this on your website and title it "The Year of the Rat."
I wrote to administrators, and to each Regent at the University of Michigan, trying to get them to look at Section 108 of the Copyright Act, which specifies the limits of library copying as an instance of fair use. I believe that Section 108 prohibits U.Michigan from what they are allowing Google to do with their books.
I also wrote to the Authors Guild, expressing my frustration that U.Michigan ignored me. A copy of that letter is here.
Google is a weed growing in the copyright garden. I was thinking that Section 108 might be used to trim back the weed. The Authors Guild is wisely hoping to pull it out by the roots. If that doesn't work, maybe they can trim it back later.
I'm quite happy with The Author's Guild suit. It looks to me like they know what they're doing. Maybe five years down the road, when the Supremes establish that opt-in also applies to websites, then we'll be able to force robots.txt into an opt-in mode instead of the present opt-out mode. That will fragment the monopoly of the big search engines, and help to give the web back to the webmasters.
If Google could show snippets from books without first copying the entire book, and if they did this without any commercial interest or intent, then I think they might have a fair-use argument. But there are some hurdles before they can get to that argument.
Many Google acolytes like to point out that Google already grabs much of the web in its entirety, which is copyrighted by default. That's true. But that doesn't mean it's legal. All it means is that search engines started doing this before webmasters got organized into associations (they still aren't organized), and there was no one to challenge the engines.
Now if webmasters had been around as long as authors, and were organized to protect their interests, the engines would have never gotten this far with their illegal crawling for profit.
For me, the issue is that Google, a rich corporation, has talked some libraries into providing access to their collections, even though the library is not the rights holder for the copyrighted works they own. The library that is most eager to let Google scan everything is the University of Michigan, a public institution.
The contract with U.Michigan was confidential until they posted it in response to a request I filed under Michigan's freedom of information law. Google gets to scan everything, and U.Michigan gets a copy of the scanned files. However, U.Michigan is not able to do anything with their copies except to offer it on their own website, assuming that they take measures to prevent excessive downloading and automated crawling.
By way of contrast, Google gets to do anything it wants with its copies, forever, and that includes selling it to partners, or passing them along to any successor of Google. They will show ads for where to buy copies of out-of-print books. The entire book will be scanned, but only snippets will be shown surrounding the search term for books that are in copyright. With this latest announcement, they say that they will not show sponsored links unless the publisher agrees to join in the Google Print program.
Google considers anything published after 1922 to be copyrighted, except for government documents that had no copyright to begin with. Now they are inviting publishers to opt-in to their Print program, so that more than snippets can be displayed, and the publisher can get a cut of the sponsored links that are clicked on.
But you have to ask yourself, how many books that were published since 1922 are represented by current publishers who are aware of Google's plans and inclined to respond to Google's invitation to opt-in or opt-out? Consider that many publishers are no longer the rights holder once a book goes out of print, as contracts often stipulate that the copyright then reverts to the author. When Google talks about allowing publishers to opt-in to the Print program, or opt-out of the scanning, my guess is that we're talking about less than 20 percent of all copyrighted material that Google plans to grab.
The other 80 percent will be grabbed by Google without the "express consent" of the rights holder that is required by copyright law, usually with the rights holder not even being aware that an opt-out is available from Google. This is what Google has its eyes on, but it's not what they want you to think about when considering this issue. The used-book purchase links alone will be a cash cow for this 80 percent. Their statement that they will not show sponsored links on pages from copyrighted books that have not opted-in is not enforceable, given that they can chang their mind about that further down the road. It's just not fair to rights holders.
The proper procedure would be for Google to solicit permission for anything in copyright, and skip that book if there is no response. They should make an arrangement with some entity similar to the Copyright Clearance Center, and invite rights holders to submit permission forms for Google to scan their books. A license fee might be involved, so that these holders can get some compensation. The question of whether ads are allowed, or how much content can be displayed, could be negotiated as part of the license fee. Then if the library has the book, no one will complain when Google scans it. If it doesn't have the book, perhaps the rights holder can make a copy available if Google still wants it.
That's what Google should be doing, instead of ripping off every rights holder since 1922 by default. There is more on this issue at Google Watch.