Web Caching: Google vs. The New York Times

← Back to Stories (view on slashdot.org)

Web Caching: Google vs. The New York Times

Posted by timothy on Sunday July 13, 2003 @09:58PM from the right-to-be-annoying dept.

An anonymous reader writes "The Google cache is a popular feature among karma fetishists. Many stories with links to the NY Times attract comments pointing to Google's copy of the article. This gives readers access to the content without registering. C|Net reports that Google is in talks with the NY Times to close this backdoor. The article raises some general concerns regarding the caching of webcontent. Shouldn't the NY Times simply tell Google not to cache their site?"

16 of 518 comments (clear)

Min score:

Reason:

Sort:

Free registration by Zog+The+Undeniable · 2003-07-13 22:00 · Score: 5, Insightful

I'd love to see their user database, just to count the number of Mickey Mice and Elmer Fudds on there. Apart from giving the NYT your e-mail addy for spam purposes, what real point is there to free registration?

--
When I am king, you will be first against the wall.
1. Re:Free registration by presroi · 2003-07-13 22:06 · Score: 5, Insightful
  
  Maybe we can agree that the NYT is a well-written, serious and interesting newspaper. Not just for New Yorkers but also for people from Sweden, Japan or New Jersey.
  
  Where would the the limit? How would you feel if you had to register for every web page which is linked to at /. (I confess, I usually click on every /.-story link)?
  
  hmm, to answer your question:
  maybe the point in registration is the signing of a contract how to use this contact. Dunno.
2. Re:Free registration by Anonymous Coward · 2003-07-13 22:41 · Score: 5, Insightful
  
  And on top of everything else, it annoys users more than just about anything else aside from spam. Can't recall exactly how many other people I know who go to see a NYT article, find the rego page, and ignore it to go find a better news source without the hassle.
  
  If they're tracking what their users are do, they're affecting their user pool in a pretty negative way just by using this method.
3. Re:Free registration by FatAlb3rt · 2003-07-14 01:32 · Score: 4, Insightful
  
  I disagree. Let's imagine for a minute that everyone provides an accurate profile, targeted marketing works, sales increase, and the advertiser gets rich.
  
  You really think that the money they spend on advertising will level off?
4. Re:Free registration by NexusTw1n · 2003-07-14 01:33 · Score: 5, Insightful
  
  I always find it ironic when people on slashdot complain about being "tracked" on NYTimes webpages or other sites that require registration.
  
  Most people have registered to use /. , and have therefore provided a valid email address. So you can't have a moral objection to giving your email addy to websites you frequent.
  
  Even if you don't register, your IP address is logged and monitored , via the sophisticated anti troll system. Try and post more than 10 times in one day as an AC, or post as an AC in reply to a post you modded and slashcode will react.
  
  So even as an AC you aren't really totally anonymous on slashdot, yet I don't see anyone who complains about NY Times links complaining about that. The only people who complain are the trolls that forced these features to be added to the code.
  
  So why do we have this tedious bitching about the NY times every time a link is posted?
  
  I registered a couple of years ago. I've never recieved a single spam to NYTimes@mydomain.com which was the email addy I used. I've never had to login because the login cookie has remained in Opera since I registered. How hard is it login and then forget about it forever more?
  
  The only reason I haven't forgotten I've registered is the continual complaints on slashdot from people who are obsessed with privacy on the net unless karma is involved. NY Times doesn't spam registered users, and any user tracking is less sophisticated than slashcode's vital anti troll features. So bear that in mind when tommorrow's NY Times story appears and the same old complaints are dragged out yet again.
  
  --
  It has become appallingly obvious that our technology has exceeded our humanity. --Albert Einstein
Test Question by Effugas · 2003-07-13 22:17 · Score: 4, Insightful

You are the new editor of the New York Times, the "Newspaper of Record" for the United States, if not the world. You are, of course, the new editor because the previous editor had to resign, taking the blame for an individual reporter's flagrant disregard for the awe-inspiring credibility of your institution. In the process of rebuilding your credibility, should you:

A) Insist that unaffiliated digital libraries restrict access to or simply eliminate all records of your "Newspaper of Record", or
B) Realize that maybe right about now is not particularly the best time to be saying to the world, "Please forget what we published last week."
There's no such thing as free registration by pslam · 2003-07-13 22:24 · Score: 5, Insightful

Apart from giving the NYT your e-mail addy for spam purposes, what real point is there to free registration?
That's the thing - it's not free depending on your definition. By my own definition, you're giving them valuable information, and they get to keep it and use it as they will, including spamming if they feel like it (or spam from any company which buys them out, they sell it to if they're feeling bankrupt, etc). It's practically misadvertising of a service, but it's accepted now, so everyone gets away with it.
If it really were free, why would you need to register in the first place?
Re:NY Times likes accuracy by anshil · 2003-07-13 22:28 · Score: 4, Insightful

Since when is content published in the WWW about privacy?

It's just like a government that wants to control which newspapers maybe archivied for history research.

--

--
Karma 50, and all I got was this lousy T-Shirt.
Re:Erm...cache? by Neophytus · 2003-07-13 22:32 · Score: 5, Insightful

I was thinking the same thing. I cann't recall seeing a NYT article linked from here with the google cache banner across the top, what I do see alot are the partner links. Google already provides for register-only news sites (financial times?) by putting a [reg only] tag beside the article. Why the NYT has chosen not to use this up until now is a tad strange, and it looks like someone has picked up the wrong end of the stick.
Shouldn't someone simply tell the NY Times: no reg by StrawberryFrog · 2003-07-13 22:35 · Score: 4, Insightful

Brand recognition is not always a good thing. When I think NY times I think "that annoying registration website". They are free to do what they want, but it leaves me cold.

--
My Karma: ran over your Dogma
StrawberryFrog
Free registration and the RIAA by mike_mgo · 2003-07-13 22:36 · Score: 5, Insightful

It's articles like this that make me think that the recording and movie industries are right to go after online piracy with everything they've got.
Here we have the NYT, one of the premier news organizations in the world, offering its articles for free on the same day that they are published. Yet a large number of people, of this online community at least, refuses to provide even a minimal amount of information (and no money) so that the newspaper can try to make its online presence profitable.
I think the spam fears are a red herring, I've been registered with the times for over 2 years. I've never gotten spam that I think is traceable from them. I get a daily email of the day's headlines (and with the click of a box I could discontinue this).
Why should the RIAA change its business model to a pennies per song method when there is such a blatant example of the online community refusing to go directly to the source for even free material?
That's not what they want. by twitter · 2003-07-13 23:36 · Score: 4, Insightful

Sure, that robots.txt should keep robots out of the entire NYT site. That's not how Google works, though. Google get's their rankings for the NYT from other sites that point too the NYT. I imagine they only archive a page when it reaches sufficient rank. This way, Google would never have to crawl though the NYT site. We can be sure that Google would be happy to drop NYT points and caches if they were asked to do that.
The New York Times wants Google to continue ranking their stories but they want Google to do them the special favor of only pointing to their registration page:
"We are working with Google to fix that problem--we're going to close it so when you click on a link it will take you to a registration page," said Christine Mohan, a spokeswoman at New York Times Digital,
If I were Google, I'd tell them such advertising services would cost them a great deal of money. That or simply drop the New York Times right into the bit bucket. It will cost Google programing time to make it happen and computing time to keep it going. If every site on the web required this kind of custom treatment, Google's task would be much more difficult and it might be easier for them to drop it.
Droping the NYT from Google is fine by me. People who don't understand the implications of digital publishing don't deserve readership. If they won't let librarians make digital coppies, libraries should drop them too. What's next, the New York Times sends cease and dissist orders to everyone who runs a proxy? It's like the NYT is trying to make their digital publication harder to share than their paper one was. A paper copy can be shared by an entire office and that's what a proxy does. A paper copy can be indexed and archived by a librarian, and Google did not even do that much. One day the paper version won't be available. If librarians can't keep their own coppies of the digital version for verification, the publication will have no credibility. If the New York Times wants to continue charging advertisers for eyballs, they had better remember that their credibility is bassed in part on widespread availability.

--
Friends don't help friends install M$ junk.
Our basic copyright assumptions are wrong by putaro · 2003-07-13 23:46 · Score: 4, Insightful

The technology has changed the way that things work but the law has not kept up with it. To start with, we continue to talk about "copyright". Controlling copying of information makes sense when the distribution mechanism is trucks moving bales of paper around. Once you start sending bits around, everything is copied. From the article:

And technically, any time a Web surfer visits a site, that visit could be interpreted as a copyright violation, because the page is temporarily cached in the user's computer memory.

When you have the newspaper delivered to your door, the content basically comes for free (the cost of a newspaper doesn't pay for much more than printing and handling). However, you get to keep the content as long as you like, chop it into bits and what not. Libraries have archives of newspapers going back years and you get to see them for free. What's the right mechanism as we move forward? The "pay per view" model that content providers want to shove down our throats courtesy of the DMCA is not pretty and when it starts to affect the average Joe I suspect it will be booed out of favor pretty quickly. But what is the right mechanism to make sure content providers get paid something and that we, the citizens, get something for our money?
Surely just to increase exposure by Mostly+a+lurker · 2003-07-14 00:00 · Score: 4, Insightful

As many others have emphasised, it is easy to turn of the Google cache for whatever pages you wish. But, in the case of the NYT, there is a further factor. They must have special code within their system to recognise the google spider and allow it access without registration. Either that, or there is some other prior agreement allowing access. Given that, they can scarcely claim extra work to support Google. I believe the whole thing is mainly to get some free publicity for their site. I suppose the other possibility is that they want the page accessible from Google News but not the regular search engine cache.
No pity for the NYT... by qtp · 2003-07-14 00:01 · Score: 5, Insightful

The NYT needs to call off the lawyers and seriously think about how they brought this on themselves.

There are so many models for running a news site that avoid this problem (Salon) that calling out the lawyers is just childish and inapropriate. If a site wants to be indexed by a search engine, then they should be aware of what that means, and if they don't like how a particular search engine functions, then they should take measures to change thier own site to prevent what they don't want indexed, or cached, from being accessed.

I know that finding pages on google that I cannot access would be infuriating, and I hope that Google realizes that many of thier users would agree.

--
Read, L
The Web and the Internet - Sad, sad, sad by miu · 2003-07-14 01:11 · Score: 4, Insightful

I had to laugh seeing this little gem attached to the story:
Special Report
The Google gods
Does the search engine's power threaten the Web's independence?
The Web's independence? The fucking web is a sad little microcosm of the real world. Google is one of the few reasons I can still stand the web, and silly statements like "Google is making copies of all the Web sites they index and they're not asking permission" are the reason the web sucks so bad. When everyone is deathly afraid of being sued or prosecuted for something it's no wonder that the web is such a clown town of worthless crap.

--

[Set Cain on fire and steal his lute.]