Slashdot Mirror


Web Caching: Google vs. The New York Times

An anonymous reader writes "The Google cache is a popular feature among karma fetishists. Many stories with links to the NY Times attract comments pointing to Google's copy of the article. This gives readers access to the content without registering. C|Net reports that Google is in talks with the NY Times to close this backdoor. The article raises some general concerns regarding the caching of webcontent. Shouldn't the NY Times simply tell Google not to cache their site?"

17 of 518 comments (clear)

  1. Re:Free registration by whm · · Score: 5, Informative

    Apart from giving the NYT your e-mail addy for spam purposes, what real point is there to free registration?

    User tracking. While cookies can do this loosely, requiring a login does this much more effectively. I know I login with my same username each time I visit the site (if it's not cached). There's very little reason not to. This gives the NYT a much better indication of how many active and repeat members they have visitting their site. They can then target ads to users much more effectively, and market their userbase to advertisers much more solidly than they could with more rudimentary user tracking methods.

    There may be other purposes, but this seems like a large part of it.

  2. It raises 2 questions .. by Mr_Silver · · Score: 4, Informative
    such as:
    1. When will slashdot stop linking to articles that require a registration?
    2. When will slashdot consider implementing caching for pages that, by linking to, they manage to take off the internet?
    Sure, the 2nd question has been answered in the FAQ. Except it was written three years ago and Google manages this just fine. Maybe time for a second look?

    On the topic of site updates, has anyone noticed that 90% of the links on http://slashdot.org/code.shtml don't work any more?

    Hell the link to an Avantgo version of Slashdot points to a website which has been broken for over 2 years.

    --
    Avantslash - View Slashdot cleanly on your mobile phone.
  3. Erm...cache? by DennyK · · Score: 4, Informative

    The article talks about Google's caching of articles that have expired to the NYT archives (which you have to pay to access). What most /. folks use to link to current NYT articles are the Google partner links, which simply bypass the free registration. I'd assume these links only work as long as an article hasn't been archived yet, so the karma whores are safe; I doubt the NYT's Google partner links will be going away any time soon... ;)

    DennyK

  4. Re:Free registration..some implications by jkrise · · Score: 3, Informative

    Actually, free reg requires a valid email id. It thus filters most bogus registrations. Secondly, news sites are planning to go the 'pay' way in about a couple of years. Getting readers to register would give more accurate estimates of readership.

    And lastly, once a site requires registration, even if free, Copyright ptohibits quoting entire articles on the web. This indeed could be the prime reason for this.

    --
    If you keep throwing chairs, one day you'll break windows....
  5. Re:Google - more useless everyday by cioxx · · Score: 4, Informative
    Sometime back, I pointed out how Google seems to have a soft corner for articles and sites that affect big firms such as Microsoft.


    "Google News is highly unusual in that it offers a news service compiled solely by computer algorithms without human intervention. While the sources of the news vary in perspective and editorial approach, their selection for inclusion is done without regard to political viewpoint or ideology. While this may lead to some occasionally unusual and contradictory groupings, it is exactly this variety that makes Google News a valuable source of information on the important issues of the day." source

    Remove your tinfoil hat please. There is no conspiracy. Google News features articles from Newsmax, Electronic Intifadah, Islam Online, Al Jazeera, World Net Daily, etc. If there was any filtering going on, these sites would have been off the radar long time ago.

    Also, Slashdot is not a professional journalistic site. It's a News-based comment board where people come to share their opinion. In a perfect world Slashdot doesn't even belong on Google News.

  6. Sweet irony by Amomynos+Coward · · Score: 3, Informative

    In case the cnet is /.'tted, here's link to Google cached page.

  7. Re:God damnit... by anonymous+loser · · Score: 4, Informative

    The *real* karma whores link to http://archive.nytimes.com anyway.

    NYTimes have futzed around with it a bit, but if you play with it, it still gives you registration-free access to their content, it just takes a couple of clicks nowadays.

  8. Re:NY Times likes accuracy by MonTemplar · · Score: 4, Informative

    What he said! Remember, the first two W's are for World Wide.

    The only people who seem to have a problem with webpage caching are either legal flacks working in CYA Mode, or webmasters who can't be bothered to mark up their pages and add robots.txt files to make sure that only public information goes out of their websites.

    --
    -MT.
  9. Re:Free registration..some implications by gilroy · · Score: 5, Informative
    Blockquoth the poster:

    and lastly, once a site requires registration, even if free, Copyright ptohibits [sic] quoting entire articles on the web.

    Actually, registration is not required to protect a work. Creating a work automatically protects it under copyright law -- no need for registration, user fees, or that little (c) thingy. At least in countries respecting the Berne Convention.
  10. Demograhpics by autopr0n · · Score: 4, Informative

    I've never been sent a single spam from the NYT. The reason they want this is for demograpics. A) it tells them who their web readers are, and B) it tells their advertizers who their web readers are. And it also allows them to show ads for products people would be most intrested in.

    --
    autopr0n is like, down and stuff.
  11. meta tags ? by matrix0040 · · Score: 5, Informative

    well cant they just use meta tags to prevent archving of their pages

    <META NAME="robots" CONTENT="noarchive">

    from
    http://www.google.co m/bot.html"

  12. Re:Um... by broeman · · Score: 3, Informative

    nope Sir, you are wrong. Wired Magazine is indeed commercialized on Wired Website. Nobody talked about company relations, well, before you did. And I still see the Lycos bar when I am on Wired Magazine's Homepage.

    --

    (yes this can be compared with sex)
  13. Re:Free registration by yelvington · · Score: 4, Informative

    NYT doesn't spam. And the percentage of net.morons who register using cartoon names is remarkably low.

    I don't work for the New York Times, but for another media company, and I'm in a position to understand the reasons for registration:

    1. Metrics. Registration supports the generation of accurate data on demographics and usage (reach, frequency) in a crosstabulated view. This is important in analytical processes to support site management and design as well as in the sale of advertising, which provides the revenue that makes the site possible.

    2. Ad targeting. Run-of-site, untargeted Internet advertising is nearly worthless on the open market (supply/demand), but advertising that is highly targeted remains highly valuable. When combined with proper analytical software and usage data, registration data can -- for example -- let me target 25- to 34-year-olds in a particular ZIP code who have been looking at real estate listings. And I can deliver that advertising anywhere on my site, such as on sports pages that otherwise would contain "junk" ad inventory. This is (measurably!) much more efficient and effective, and I can charge fairly high CPM prices. Importantly, this can be accomplished without providing any personal data to the advertiser, protecting the anonymity of the user.

    3. Reduction in traffic. Reduction is actually desirable in many cases. Not all customers are good customers, and not all traffic is good traffic.

    On the Google issue: I used robots.txt to block Google from indexing the AP content on our 27 newspaper sites, because I have no desire to be the unpaid provider of wire stories for Google News so that they can be read by users outside our markets. Additionally, I have used a router block to prevent several commercial Web clipping services from having access of any sort to any of our sites.

  14. You are welcome to use xxxxdd@xxxx.com any time. by Futurepower(R) · · Score: 3, Informative


    Your comment was confusing to me until I realized that you are talking about giving NYT an actual email address. Why would you do that? Isn't that why we have hotmail.com? Give an address that does not exist or a throw-away address.

    Last week I was registering at a web site and I put in xx@xx.com for the address. The system responded, "This address has already been registered." So then I put in xxx@xxx.com. The system responded, "This address has already been registered." So I entered xxxx@xxxx.com. Same response. Finally I awoke fully and entered some Ds, xxxxdd@xxxx.com, and the system accepted my "registration".

  15. Re:Free registration by mysticgoat · · Score: 3, Informative

    You've brought out some very good information in a well-written way. Thank you. I'll cover much of the same ground from the satisfied user's viewpoint.

    1. NYT and spam: there is no relationship between these. That's my experience after years of subscription, and a number of other people on this thread report the same thing. The Yahoo portal news service is also good this way (and gives me Reuters: an excellent supplement to NYT).
    2. The metrics thing: I provided NYT with true demographics when I signed up, because I know that will help them deliver product more efficiently and sell their advertising.

      I want that. I like the service NYT provides, and so I want them to succeed. I very much want them to continue to provide me with a free subscription-- and I'm willing to help them hold their costs down and maximize their advertising revenues.

    3. Focused advertising: I don't like ads, but I'm willing to put up with their presence in exchange for a service like NYT.

      NYT has done a good job of keeping the impact of the ads low: the ads don't get in the way of reading the stories and they don't slow page loading significantly (since I'm on a slow rural dial-up, that's very important). If NYT starts to charge me, I'll be less tolerant of the ads. If the advertising starts slowing down the page loading, I'll drop my subscription. There are a number of other news services-- CNN, ABC, etc-- that I don't use because the advertising burden slows page loading or otherwise gets in the way.

      As to focused ads-- I'm all for that. I'd rather ignore stuff that's somewhat pertinent to my life than ignore crap I'd never buy. An ad for reading glasses is pertinent to me, but an ad for skateboards is crap-- I was long past skateboarding age before the first ones hit the street. Reading glasses are something me and my cohorts have to live with, and we talk about them. Nobody in my circle of friends has a skateboard and I don't recall ever talking about them. (Of course skateboards would be a problem for me and my neighbors: I don't think they do well on gravel and road apples.)

      And sometimes the advertising actually works-- sometimes it makes me aware of a product or company that I'll want to talk over with my buddies, and maybe try out. That is much more likely with focused ads. As I recall, my first awareness of the existence of fold-up reading glasses in a hard case (suitable for hiking, bicycling, and other hip pocket activities) was from an advertisement. Now I've got a couple of pairs of them. Neat.

    About Google's archive, NYT, and slashdot: Something I hope NYT considers is that the Google archive gives it (and at least some of its ads) exposure in demographic groups that it would otherwise never reach. Such as the tinfoil hat superparanoid geek crowd. While there is no way to develop metrics on this, nor any way to market this to advertisers seeking targetted audiences, this exposure is certainly more beneficial than harmful. Besides, every once in a while somebody matures a little and puts away their tinfoil hat-- and then is a likely candidate for the kind of news service NYT provides.

    So I think it would be very hard for NYT or Google to assess whether the Google cache is harmful or beneficial.

  16. Re:Free registration by Rob+Riggs · · Score: 3, Informative
    There is a significant difference to logging in to a site in order to participate in conversation and logging in to simply read news. At /., posting requires an identity, since anonymous postings are mostly ignored. However, there is absolutely no requirement that one log in to /. in order to read the stories. Your anology is broken. Privacy should be a choice. At /. one has that choice, with the NYT one does not.

    Another point is that anonymity is one of /. greatest strengths. Some of the most insightful and interesting posts have been from "insiders" posting anonymously.

    NY Times... user tracking is less sophisticated than slashcode's vital anti troll features.

    Care to back this statement up?

    ...continual complaints on slashdot from people who are obsessed with privacy on the net unless karma is involved

    You seem to be quite willing to give up those rights. And that's OK. But there are people here that feel that privacy is a rather important right. That should be respected as well. Enough people actually thought that privacy was a right of such importance that it is enumerated in the Universal Declaration of Human Rights (see Article 12).

    --
    the growth in cynicism and rebellion has not been without cause
  17. Re:Free registration by zcat_NZ · · Score: 3, Informative

    Adding one little line of code to every one of the myriad of pages on the New York Times website is not a small deal. It's going to involve a lot of paperwork, testing, and coding on the part of a lot of people.

    But it's not one line of text on EVERY page. It's one line of text in /robots.txt, a file that is independent of the rest of the site and never even accessed by ordinary browsers.

    It's probably simpler for Google to create a registry of "do not cache" pages on their end. And it's more their responsibility, anyway, being the ones who created the cache in the first place.

    Google already have exactly such a registry, and they don't even wait for sites to contact them.. Their robots -asks- the site (via the recognised standard '/robots.txt' file) if they object to being indexed and/or cached. Most other search engines look for the same file and handle it the same way.

    This is (from my perspective) far better than having to individually register your site with the several hundred search engines that might try to index it..

    --
    455fe10422ca29c4933f95052b792ab2