Slashdot Mirror


Web Caching: Google vs. The New York Times

An anonymous reader writes "The Google cache is a popular feature among karma fetishists. Many stories with links to the NY Times attract comments pointing to Google's copy of the article. This gives readers access to the content without registering. C|Net reports that Google is in talks with the NY Times to close this backdoor. The article raises some general concerns regarding the caching of webcontent. Shouldn't the NY Times simply tell Google not to cache their site?"

10 of 518 comments (clear)

  1. Re:Free registration by whm · · Score: 5, Informative

    Apart from giving the NYT your e-mail addy for spam purposes, what real point is there to free registration?

    User tracking. While cookies can do this loosely, requiring a login does this much more effectively. I know I login with my same username each time I visit the site (if it's not cached). There's very little reason not to. This gives the NYT a much better indication of how many active and repeat members they have visitting their site. They can then target ads to users much more effectively, and market their userbase to advertisers much more solidly than they could with more rudimentary user tracking methods.

    There may be other purposes, but this seems like a large part of it.

  2. It raises 2 questions .. by Mr_Silver · · Score: 4, Informative
    such as:
    1. When will slashdot stop linking to articles that require a registration?
    2. When will slashdot consider implementing caching for pages that, by linking to, they manage to take off the internet?
    Sure, the 2nd question has been answered in the FAQ. Except it was written three years ago and Google manages this just fine. Maybe time for a second look?

    On the topic of site updates, has anyone noticed that 90% of the links on http://slashdot.org/code.shtml don't work any more?

    Hell the link to an Avantgo version of Slashdot points to a website which has been broken for over 2 years.

    --
    Avantslash - View Slashdot cleanly on your mobile phone.
  3. Erm...cache? by DennyK · · Score: 4, Informative

    The article talks about Google's caching of articles that have expired to the NYT archives (which you have to pay to access). What most /. folks use to link to current NYT articles are the Google partner links, which simply bypass the free registration. I'd assume these links only work as long as an article hasn't been archived yet, so the karma whores are safe; I doubt the NYT's Google partner links will be going away any time soon... ;)

    DennyK

  4. Re:Google - more useless everyday by cioxx · · Score: 4, Informative
    Sometime back, I pointed out how Google seems to have a soft corner for articles and sites that affect big firms such as Microsoft.


    "Google News is highly unusual in that it offers a news service compiled solely by computer algorithms without human intervention. While the sources of the news vary in perspective and editorial approach, their selection for inclusion is done without regard to political viewpoint or ideology. While this may lead to some occasionally unusual and contradictory groupings, it is exactly this variety that makes Google News a valuable source of information on the important issues of the day." source

    Remove your tinfoil hat please. There is no conspiracy. Google News features articles from Newsmax, Electronic Intifadah, Islam Online, Al Jazeera, World Net Daily, etc. If there was any filtering going on, these sites would have been off the radar long time ago.

    Also, Slashdot is not a professional journalistic site. It's a News-based comment board where people come to share their opinion. In a perfect world Slashdot doesn't even belong on Google News.

  5. Re:God damnit... by anonymous+loser · · Score: 4, Informative

    The *real* karma whores link to http://archive.nytimes.com anyway.

    NYTimes have futzed around with it a bit, but if you play with it, it still gives you registration-free access to their content, it just takes a couple of clicks nowadays.

  6. Re:NY Times likes accuracy by MonTemplar · · Score: 4, Informative

    What he said! Remember, the first two W's are for World Wide.

    The only people who seem to have a problem with webpage caching are either legal flacks working in CYA Mode, or webmasters who can't be bothered to mark up their pages and add robots.txt files to make sure that only public information goes out of their websites.

    --
    -MT.
  7. Re:Free registration..some implications by gilroy · · Score: 5, Informative
    Blockquoth the poster:

    and lastly, once a site requires registration, even if free, Copyright ptohibits [sic] quoting entire articles on the web.

    Actually, registration is not required to protect a work. Creating a work automatically protects it under copyright law -- no need for registration, user fees, or that little (c) thingy. At least in countries respecting the Berne Convention.
  8. Demograhpics by autopr0n · · Score: 4, Informative

    I've never been sent a single spam from the NYT. The reason they want this is for demograpics. A) it tells them who their web readers are, and B) it tells their advertizers who their web readers are. And it also allows them to show ads for products people would be most intrested in.

    --
    autopr0n is like, down and stuff.
  9. meta tags ? by matrix0040 · · Score: 5, Informative

    well cant they just use meta tags to prevent archving of their pages

    <META NAME="robots" CONTENT="noarchive">

    from
    http://www.google.co m/bot.html"

  10. Re:Free registration by yelvington · · Score: 4, Informative

    NYT doesn't spam. And the percentage of net.morons who register using cartoon names is remarkably low.

    I don't work for the New York Times, but for another media company, and I'm in a position to understand the reasons for registration:

    1. Metrics. Registration supports the generation of accurate data on demographics and usage (reach, frequency) in a crosstabulated view. This is important in analytical processes to support site management and design as well as in the sale of advertising, which provides the revenue that makes the site possible.

    2. Ad targeting. Run-of-site, untargeted Internet advertising is nearly worthless on the open market (supply/demand), but advertising that is highly targeted remains highly valuable. When combined with proper analytical software and usage data, registration data can -- for example -- let me target 25- to 34-year-olds in a particular ZIP code who have been looking at real estate listings. And I can deliver that advertising anywhere on my site, such as on sports pages that otherwise would contain "junk" ad inventory. This is (measurably!) much more efficient and effective, and I can charge fairly high CPM prices. Importantly, this can be accomplished without providing any personal data to the advertiser, protecting the anonymity of the user.

    3. Reduction in traffic. Reduction is actually desirable in many cases. Not all customers are good customers, and not all traffic is good traffic.

    On the Google issue: I used robots.txt to block Google from indexing the AP content on our 27 newspaper sites, because I have no desire to be the unpaid provider of wire stories for Google News so that they can be read by users outside our markets. Additionally, I have used a router block to prevent several commercial Web clipping services from having access of any sort to any of our sites.