Web Caching: Google vs. The New York Times

Worst result by presroi · 2003-07-13 22:01 · Score: 2, Interesting

The worst outcome would be a google-database which is not representative for the general web. I simply ecspect all results in google to be accessible without registering, paying or doing anything similar.

Yes. by Naikrovek · 2003-07-13 22:06 · Score: 4, Interesting

Shouldn't the NY Times simply tell Google not to cache their site?

You do realize that this is probably the basis of the "talks" that are going on, right? C|Net (as per usual for them and every news agency) is making a big deal of it to get themselves and their advertisers that tiny wee bit more of attention. Every little bit helps i guess.

Check http://nytimes.com/robots.txt in a week.

Re:Yes. by Anonymous Coward · 2003-07-13 22:25 · Score: 1, Interesting

AFAIK, there is no standard for allowing index robots to scan a site, but disallowing cache robots at the same time. In this case, it shouldn't be a problem because the Google bot obviously gets the VIP treatment anyway (or does the Google bot log in to the NYT webserver?). But in the general case, do we need a better robot exclusion standard to distinguish between certain types of robots?
Re:Yes. by AyeRoxor! · 2003-07-13 23:31 · Score: 2, Interesting

"It's entirely upto the NYT whether to let Google's robots to index their site, isn't it?"

I personally think the NYTimes wants Google to continue to cache their stories.

If they use robots.txt, no NYT articles will come up in Google. However, if they *do* succeed in these talks, I presume the articles will still come up, but uncached, and linking to a signup/login screen. It makes pretty good business sense.
Re:Yes. by Arker · 2003-07-13 23:33 · Score: 2, Interesting

A robots.txt would stop google from indexing the site altogether. They don't want that to happen. They want a google search to show NYT web pages, but they just want to make sure that when the user tries to view it, they have to register with NYT first. That means that google must still index the page, but not allow access through the cache. Plus, it must direct to a sign-on page rather than the page itself, but that is something that I'm sure the NYT itself could handle, like it think it does now anyway.

I sincerely hope google shows some balls and tells them to f right off.

They can't have it both ways, either they're on the web or they're not. They've been trying for years to subvert things so they can have their cake and eat it too, and they need to get told no by someone they'll listen to.

--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Friends don't let friends enable ecmascript.
Re:Yes. by StarFace · 2003-07-13 23:40 · Score: 4, Interesting

True, there is no standard, but Google's method of allowing indexing and caching as independently selectably features is well documented and extremely easy to do. You can even tell Google specifically to stop caching, if you don't mind smaller engines caching.

So, it isn't a standard, but it is a piece of cake for NYT to figure out, and indeed, they already have. As the person above said, this is just C|Net trying to be a real news source. The article even says that the method I just described is the focus of their "discussions."

I imagine the discussions, if anything, were intially a friendly lawyer call (if even that,) which was quickly diverted to a tech issue and ended up with some webmaster at NYT getting the specifics of how to set up the Times so that Google will still index their pages and bring them up with searches, but not cache them.

--
V

NY Times likes accuracy by Anonymous Coward · 2003-07-13 22:08 · Score: 4, Interesting

The reason they're trying to stop this is because with NYT reputation, they keep retracting stories all the time. With Google cache this could be problematic and the management/editors/authors could get into trouble again.

I do however dislike Google cache for many reasons. It's bad for privacy.

I'll do what I want! by scudco · 2003-07-13 22:09 · Score: 2, Interesting

I think if the NY Times has a problem with then they have the right to stop google from caching the site, but I do no think it would be a wise decision on the part of the NY Times. The NY Times enjoys being a reputable news outlet and if they were to lower their readership and more imporantly not allow everyone to read their articles. It would hinder their reputation in a slight way and all slashdotters might turn to a different news outlet, which is only bad for them.

Might not be all bad... by leshert · 2003-07-13 22:10 · Score: 4, Interesting

As the poster mentioned, Google already has a way to opt out of caching, so "talks" sounds like this is something different. My guess is that Google will become an affiliate of the NYT (in other words, if you hit a NYT link from Google, you're exempted from registering), and will then drop the caching.

Re:Might not be all bad... by leshert · 2003-07-13 22:16 · Score: 2, Interesting

Pardon me for self-replying but it just occurred to me: maybe Slashdot itself might have an interest in becoming a NYT affiliate? Surely the NYT gets a good chunk of pageviews (and therefore ad revenue, modulo the minority that block them) every time one of their articles shows up here.

Registration isn't a 4 letter word by Anonymous Coward · 2003-07-13 22:10 · Score: 2, Interesting

With respect to the NYT, I registered some time ago. Never received any associated spam or experienced any problems other than trying think up fake information different than all the other fake information I'd submitted elsewhere.

If registering allows the owners of a website to leverage their success by having a certain number of registered users, all the more power to them. Aside from the one-time sign-up "inconvenience," I don't see any issue, assuming the website operator is either a known entity or otherwise reputable, of course.

As for the issues related to Google's caching, I'm waiting for cached mp3's.

Closing the Google cache "backdoor"... by Homology · 2003-07-13 22:11 · Score: 4, Interesting

will deny getting access to older articles in newspapers.

It's worth remembering that newspapers sometimes edit/remove articles they publish on their homepage. Without a Google cache it may be much harder to verify that a story has indeed been modified.

And out comes the lawyers... by Anonymous Coward · 2003-07-13 22:11 · Score: 5, Interesting

Don't you just hate it when promising new technology is curbed by outdated laws?

Here in Denmark we had a service similar to news.google.com for danish newspapers. The newspaper organisation sued the service for parasiting on their databases (which is prohibited in Denmark). The service was shut down half a year ago and we now don't have that kind of service anymore.

Of course newspapers should be allowed to publish their stuff without others copying it but they refused to even use a "robots.txt" (which the news service respected) to stop indexing.

If you publish your stuff on the internet and don't tell people that they should not index it, cache it or what do I know - then you better expect them to do that. Let us put those lawyers back where they belong.

Anyone above this post hasn't read the article. by banal+avenger · 2003-07-13 22:16 · Score: 5, Interesting

The Internet Archive, which I just used minutes ago to find a handy page removed years ago, is an interesting corollary to the Google cache. I often wonder how it has survived thus long without a major lawsuit. It also reminds how crappy the web looked 5 years ago.

At any rate, cache-ing is an important force on the internet, and isn't one that should be limited in any legal way, including litigation.

actually... by Draghkhar · 2003-07-13 22:20 · Score: 5, Interesting

Actually the NYT has already begun using google's NOARCHIVE option to prevent content caching. Here's an excerpt from the this morning's front page story's source:

!-- ADX SETUP: page: www.nytimes.com/yr/mo/day/international/worldspeci al/14IRAQ.html positions: Top5,Middle1,Right3,Middle5,Right,Travel7,Travel11 ,Bottom1A,Bottom3A,Right5,Right6,Right7,Right8,Bot tom8,Bottom7,Inv1,Inv2,Inv3,Frame4,Right4 kwds: politics+and+government;international+relations;ir aq;suggested%5ftopnews;suggested%5finternational;s uggested%5fworldspecial;suggested%5fmiddleeast --

meta name="ROBOTS" content="NOARCHIVE"

Kind of makes me wonder what's the point of the story, since it even says there's an easy way for concerned parties to opt out of the cache.

Re:Worst result .. it's reality already by jkrise · 2003-07-13 22:20 · Score: 2, Interesting

Gradually, Google's built up a 'good guy' image, and now looks like they're going for the kill. Already Google seems to be the only search site around, and they censor and distort like mad.

Consult the word: Googlewash, and you'll find a lot of info on the referenced article from The Register (it's available now, earlier this was censored). Incidentally, the affected article was a NYT OpEd piece!

--
If you keep throwing chairs, one day you'll break windows....

Sentient Crawlers? by MegaT · 2003-07-13 22:25 · Score: 2, Interesting

How does Google manage to cache a page which requires free registration anyway? Are the crawlers that smart?

Illogical. by HanzoSan · 2003-07-13 22:26 · Score: 4, Interesting

Trying to sell web pages is like attempting to sell mp3s on a p2p services where all mp3s are free.

IT wont work. Instead you should use your websites to market and sell your magazine subscriptions.

Like Wired.

--
If you use Linux, please help development of Autopac

Re:Google - more useless everyday by jkrise · 2003-07-13 22:26 · Score: 2, Interesting

Google News is highly unusual in that it offers a news service compiled solely by computer algorithms without human intervention.

A May article was referenced in Google, but the link pointed to a March 6th article. How can computer algorithms cause this?

While this may lead to some occasionally unusual and contradictory groupings, it is exactly this variety that makes Google News a valuable source of information on the important issues of the day.

Just search for Googlewash using Google. Read story in TheRegister (it's not delisted now). Watch hypocrisy in action. Roll up eyes in disbelief. Adjust tin foil hat.

--
If you keep throwing chairs, one day you'll break windows....

Re:Google - more useless everyday by MonTemplar · 2003-07-13 22:29 · Score: 4, Interesting

First off, Google News is still in Beta at the moment.

Second, the Google News database only goes back a month or so, probably by design.

Third, I was able to search for 'site:slashdot.org microsoft oregon' on Google just fine this morning. Got 243 results, and the Google Cache has copies of the first three pages returned, which relate directly to the Oregon bill you use as your example.

So, where is the problem?

--
-MT.

Archives by Daemonic · 2003-07-13 22:33 · Score: 2, Interesting

I think newspapers expect their archives to be real revenue generators in the future. ISTR journalists/columnists getting annoyed a few years ago when these archives started to appear, as they weren't getting paid any extra money for having their work effectively republished, but I suppose any such legal arguments have been resolved one way or another by now.

Re:Free registration by cobbaut · 2003-07-13 22:34 · Score: 5, Interesting

Apart from giving the NYT your e-mail addy for spam purposes, what real point is there to free registration?

I always use a different address to register online in the form of website@mydomain.
I registered with the NYT in 1999, I never received a single spam on this address.

--
European Linux user, living in Antwerp

Registration should not be necessary by Anonymous Coward · 2003-07-13 22:48 · Score: 1, Interesting

"Yet a large number of people, of this online community at least, refuses to provide even a minimal amount of information (and no money) so that the newspaper can try to make its online presence profitable"

There are scads of newspapers that offer online services for free without the hassle having to force you to put in "Elmer Fudd at zip code 90210" registration information.

The whole registration thing has to go: it is just one example of now the NYT just doesn't cut it.

slashdot does not cache. by leuk_he · 2003-07-13 22:49 · Score: 2, Interesting

Slashdot decided never to cache a site themself(see the faq). As a result form this many sites have died in the process of being /.ed.

Why doesnt /. cache the articles? Too much legal work i suppose. Why does google get aways with this? They took the legal work?

Re:Google - more useless everyday by dhodell · 2003-07-13 22:51 · Score: 5, Interesting

Just FYI, this behavior is due to the fact that Googlebot has a sort of "built-in" mechanism to ignore (or at least rank lower) forum-type sites. Since /. is primarily a "news headline and discussion" site, Google will not rank it as highly as one that seems to be more "on-topic". This is because there is no guarantee that any URLs or email addresses within the page have anything to do with the actual page content itself.

Outside of user posts, /. has little genuine unique content. It summarizes a lot of headlines; this content is not unique.

Other (large) factors determine the way Google ranks pages, including the "PageRank" feature. There are lots of documents about the way Google ranks sites, I suggest to check them out. The best way is probably to Google for it :).

Anyway, this is a bit more on-topic:

I highly appreciate Google's caching feature, and don't see how it can be taken as "bad".

This is what's "bad" about Google and what I expect that, at some point, will come to haunt them. For instance, if I want to get serial numbers without porn popups, I can usually search for something like "Office XP Serial Number Serialz Warez" or something similar. Within the first couple of pages, I will probably find my serial number in the text of the page description.

If not, it's on the page, oftentimes without a popup, since the serial/crack page itself is the one linked.

Want to find X-Win32? How about doing "* * * * * * xwin32*.exe" - lets get some directory listings containing this filename.

No doubt this proves that Google is more than just a search machine... but I think that their superior techniques will definitely come back to haunt them in the future. NYT is way off target with bitching about their caching features... you can turn this off easily, and there are a plethora of scripts one can use to break out of Google's cache and send someone to the main site (or, perhaps, login area in the case of NYT).

But, in other news, Google might need to watch out...

--
Kind regards, Devon H. O'Dell

Re:Free registration..some implications by bigbob2k02 · 2003-07-13 22:52 · Score: 4, Interesting

"Actually, free reg requires a valid email id. It thus filters most bogus registrations."

I don't find that to be true. Maybe you need to save the random login page onto a local computer, or maybe you need to block referrers with a firewall, but the random login works well for me, for viewing pages:

www.majcher.com/nytview.html

Use it frequently and often!

Currently I see "Welcome, paohjjkmtpfd."

At Washingtonpost.com, they only want gender, year of birth, Zip or Country. Pick most randomly, but always use 1984 for birth year.

"Secondly, news sites are planning to go the 'pay' way in about a couple of years."

More and more archives are already going pay per view.

Caching of news sources ensures data integrity by mikeophile · 2003-07-13 23:06 · Score: 4, Interesting

In the old paradigm of news publishing, the product was printed indelibly on paper.

Hardcopy newspapers can't be erased or amended to suit whatever powerful interests might be embarassed by the truth.

Web-based publications may not be immune to such protection if they are archived by one source.

To not allow independant caching of news is just another step closer to historical revisions and distortions.

I'm not trying to say that such a thing is inevitable, but it would make things a great deal easier for those who would be inclined to manipulate the public.

If you don't want it read, don't put it up! by Anonymous Coward · 2003-07-13 23:12 · Score: 1, Interesting

" i think the "point" is google are copying content first without permission and leaving the site owner to do the work of removing/robots.txt"

Any good search engine should ignore "robots.txt". If you don't want it read, don't put it up on the web in the first place.

"Google is making copies of all the Web sites they index and they're not asking permission,""

So sue every user with Netscape, MSIE, and Opera: they are copying web content into their own caches all the time.

So "Opt-Out" isn't good enough for companies? by UnifiedTechs · 2003-07-13 23:18 · Score: 3, Interesting

Anyone else see the irony that big buisness feels that "Opt-Out" is a fair policy when advertising to thier customers by phone and Spam. But when google gives them an easy and accesable way to opt-out of thier caching system by use of robot.txt and the NOARCHIVE meta-tag that isn't enough for them and they feel opt-in is the only way to go.

--
iRepairIT - iPhone, Mac, & PC Repair

Google's cache by swilver · 2003-07-13 23:26 · Score: 4, Interesting

I like google's caching quite a lot. I use it almost exclusively these days before visitting the actual page (if I even get that far). Using Google's cached link has the advantage of:

1) Speed... Google's cache is fast. If there's one thing that annoys the heck out of me, then its websites that take more than 5 seconds to load. This is quite annoying when its caused by javascripts, slow servers or popup ads when Google can serve me effectively the same page in under a second -- especially when I'm not even sure if it is the right page, the one I'm looking for.

2) Nice highlighting so I can quickly page down to whatever I was looking for (now if only Google blocked those Tripod background pictures which makes their cached pages unreadable..) Sometimes I wish Google made their highlight examples at the top clickable so it jumped to the first appearance of the keyword immediately.

3) Using Google's cached links usually blocks silly popups and other annoying stuff too many websites seem to incorporate these days.

Perhaps I'll make a proxy server which browses the web exlusively using Google's caching... word highlighting on all pages, fast browsing everywhere and working links to more cached pages... should work fine for any webpages below 100kB :)

As for the NY Times being annoyed with Google's cache, they can easily fix that themselves. Either that or Google's spiders are a lot smarter than I thought to automatically register themselves for the NY times. Furthermore, as far as I'm concerned everything that's publicly accessible on the web without some form of password protection (which would of course also block robots) should be cachable and archivable in whatever form you see fit. Respecting robots.txt is no more than a courtesy as far as I'm concerned. If you don't want your pages to be archived or cached or whatever, then by all means protect your page, or donot put up a webpage in the first place (I'm sure a thousand others will leap at the chance to fill the void).

--Swilver

Problem is potentially bigger than caching Re:Yes. by leoaugust · 2003-07-13 23:28 · Score: 4, Interesting

I wonder where this will stop. NYTimes might get google to stop caching the direct link for a certain article. That is fine. But it is just one more step to do a search in google for the article with a few keywords from the article. If any person has been good enough to save it in a personal page, discussion board (like traditionally done for articles likely to be slashdotted) or any other place, the google results will show it. Would NYTimes now want to restrict google from showing thses pages because of the copyrighted stuff. You will be amazed as to how many articles I find this way. Many of them are just excerpts but others are complete.

Another thing on a tangent was that I really do hate the fact that information is restricted for just one fundamental reason - if it is not commonly available then it cannot be linked to in most of my writings for they are going to be unavailable to the party that I am writing to. This is especially true if the writing is not immediate but is meant to be read a month or two later. This is also relevant to Bloggers who might make comments and refer to a link, only to have the links go dead because the content is space,time, or space-time restricted. I am willing to pay for reading the articles, but before I can write about them I need to ensure that they are going to be available to my common readers. And as in the Blogging or P2P scenario I am not sure if one person is going to read my writing or thousands so buying a license for them is illogical. And then, if they need to send it further, are they also supposed to pay ??? Basically, for me to be able to write, to build upon existing work, to look ahead standing on the shoulder's of the giants, I need to be able to pass on the information. I am adding value because I am couching that content in a context, but until I can freely share the underlying articles too, my product is stunted. I can reach narrow audience but can't reach the common All this is very good in developing software where you might negotiate a deal once in a while to include someone's underlying code, but not writing where you might be writing 10-15 articles a week ...

Basically all I am saying is that there should be a movement similar to Open Source not only for software products, but for journalistic content.

--
To see a world in a grain of sand, and then to step back and see the beach where the sand lies ...

Art Spam by Stone+Pony · 2003-07-13 23:37 · Score: 3, Interesting

I registered with the NYT about a year ago and I've had little or no spam as a result. I say "little or no" because I did get an e-mailed bulletin about the world fine-art market twice a week or so for several months. I assumed that this was a result of registering with NYT because it seemed to fit the "NYT demographic" rather better than any of the other things I've ever registered for.

Is there any easy (spam isn't such a problem for me - touch wood - that I'm willing to spend ages looking into where it comes from) way of telling where this stuff originates from?

What the hell is wrong with you? by Rogerborg · 2003-07-13 23:44 · Score: 2, Interesting

Is there some problem with readers, with editors, hell, with story submitters, actually reading the damn article before making snide speculations?

"Wwe're going to [fix] it so when you click on a link it will take you to a registration page," said Christine Mohan, a spokeswoman at New York Times Digital, the publisher of NYTimes.com.

That's why they don't just tell google to not cache. They want the links to appear, but not to the stories themselves.

How about we discuss that issue, rather than some other, theoretical issue? I know it's an alien concept, but let's give it a try.

Here, I'll start it off. It looks like a decent idea. Google still gets the links, the NYT still gets the traffic, everyone gets to find the articles they want. What's not to like?

--
If you were blocking sigs, you wouldn't have to read this.

Do not mess up Google by Mostly+a+lurker · 2003-07-14 00:41 · Score: 2, Interesting

Google is one of the few complex services on the web that is almost always relevant when one tries to use it. The Google cache is one great feature. If they manage to unnecessarily gut that, I wonder what other features they will find to complain about next.

Login user tracking by Anonymous Coward · 2003-07-14 01:27 · Score: 1, Interesting

Has anyone else but me noticed that if you click a link off from http://news.google.com that points to a NYT article, that you do not have to log in to read it? Google and the NYT must have some sort of arrangement regarding that. I don't see what the problem with having their pages cached is anyway. They must have special provision to allow google to read the articles without logging in anyway or they would not get crawled.. Anybody try browsing the NYT with Googlebot as their user agent string? I still had to log in. Probably the actual user agent string that googlebot uses has some numbers etc after it. I still bet that if you used the actual user agent string as the googlebot, that you would find that you did not have to log in to view stuff that you did before. I ought to try that whilst browsing for pr0n. I bet you wouldn't have to log in to some pay sites.

Re:Free registration by NexusTw1n · 2003-07-14 01:51 · Score: 2, Interesting

Slashdot allows you to view the content but not post even anonymously without being tracked at IP address level.

The NY Times allows you to view the archive anonymously , and allows you to view the main page with a password you could google for easily.

So yeah, it's nothing like slashdot's system - NY Times intrudes far less into your privacy.

--
It has become appallingly obvious that our technology has exceeded our humanity. --Albert Einstein

Complaining about being tracked by clary · 2003-07-14 02:05 · Score: 2, Interesting

I am not worried about being tracked, but rather don't find the content of the NY Times compelling enough to bother acquiring one more username and password. On the other hand, I've registered with slashdot for the amusement of karma and to my.yahoo for the spamcatcher email account and personalized weather.

I don't complaint to slashdot, but did email NY Times and tell them such. (They graciously offered to sell me a paper subscription, no email registration required. ;-)

I also don't avoid no-registration links or slashdot posts that contain copies of NY Times articles. I guess that makes me a hypocrit.

--

"Rub her feet." -- L.L.

Re:Free registration by keithdowsett · 2003-07-14 02:14 · Score: 2, Interesting

I'm continually surprised by how often these free registration programs will accept root@localhost or abuse@localhost in the e-mail field.

This has the pleasing consequence that it unless they employ someone to vet the list they are likely to end up spamming themselves or their provider. Both much more amusing than sending it to a completely fake address.

Naturally, the rest of these forms must be treated as an exercise in creativity - and we should give our creations suitable names. My favourites include Hugh Jorgens and Tess Tickle.

So, treat these forms like you treat the religious nuts who arrive on your doorstep preaching salvation - as a source of amusement.

Speaking of which... by arhca · 2003-07-14 02:29 · Score: 3, Interesting

Why doesn't Slashodt cache news articles and stories before running a story? It would make a lot of sense for text based news items.

Google News & Subscription sites by frostman · 2003-07-14 02:35 · Score: 3, Interesting

I use Google news pretty regularly, and I've noticed that some of their links are to paid subscription sites. These are clearly marked as such ("subscription").

I don't generally click on those links, but I think it's a good idea, since I'm not actually going to Google for the news, rather for links to the news. The reason I personally don't click on the subscription links is that I have my favorite set of real newspaper sites (some registration, some free, some not) and that's not what I'm using Google News to find. Someone else, however, probably is using it that way.

I would guess that Google gets something back from that sort of link, since the site owner is getting more from the link than Google is from the listing. (Maybe I'm wrong, of course.)

It makes perfect sense to have something like that for the regular search engine, and to charge for it, as long as it doesn't affect the link's rank in the search results.

For example they could have a special command for robots.txt (or google.txt maybe) that would allow Google to access and cache the page, but the regular link would go to some registration page (easy to do) *and* the cache link would also go to some kind of registration page, defined in the google.txt file.

The NYT would promise that the cached page is really the cached page, and pay Google something for redirecting to NYT's cache (with registration). Or even better, there would be some kind of redirect where I actually get the cache from Google after I've registered with NYT.

They're probably thinking of something like that, because otherwise the solution would be to simply disallow caching, and that wouldn't be news, would it? ;=)

--

This Like That - fun with words!

Google's cache copy - the larger issue by Everyman · 2003-07-14 02:36 · Score: 5, Interesting

The question is framed very narrowly by Slashdot, so this discussion misses the larger issues. The cache copy is an issue in Google's main index for many webmasters. The Google News situation is a subset of a larger problem; the cached link doesn't exist in Google News. Google News is a much narrower issue. I'd like to bring up the issue of full-text caching done by Google in their main index.

My problem with the cache is that it gives Google a competitive advantage that is unfair, and furthers their monopoly. This is especially unfair since it is most likely illegal -- assuming that you could ever get a good test case into court, or get a class action lawsuit going by some webmasters, publishers, or search engines.

To add to the attractiveness of the cache copy, consider what Google has done:

1) The cache copy makes it possible to highlight the search terms, whether or not you have the toolbar installed.

2) The download time for the cache copy from Google's servers is always faster than from the original website.

3) You never get a 404 "not found" or a DNS lookup failure for the cache copy.

4) The link to the page recommended by Google for bookmarking at the top of the cache copy is a link to Google's copy, not to the original page.

5) How about all that Google branding on the top of the cache copy? Priceless. I feel the cache should be opt-in, not opt-out. The only way you can avoid it right now is to place a "noarchive" meta on every page in your site. On some file types, such as .txt files, there's no place to insert a "noarchive" and Google goes ahead and caches it anyway.

The cache copy tends to keep eyeballs on google.com, and increases their searches. You may have noticed that many major news sites won't link to other websites in their stories anymore, but rather just mention the relevant site without putting a link behind it. That's because they don't want eyeballs wandering off of their page. A wandering eyeball may not come back and look at more ads. That's basically one of the big reasons behind the cache copy as well -- it keeps eyeballs from wandering as much as they would without the cache.

All the Google partners -- AOL, Earthlink, Yahoo, Netscape -- don't include the cache links, and I assume that this is the reason. They don't want people wandering off to Google and staying there.

As new competition is organizing to challenge Google's monopoly, from places such as Overture (Alltheweb and AltaVista), Yahoo (Inktomi), AskJeeves/Teoma and Microsoft, these engines have to consider whether to fight Google on the cache copy, or offer their own cache copy even if they think it is illegal. There isn't really any middle ground on this.

Many observers with legal expertise feel that while the snippets are "fair use" of a website's content, offering the full text in a cache version is not. Copyright law requires "express permission," but Google only offers an incomplete and inconvenient opt-out. I suspect that the legal departments of these other engines are more inclined to challenge Google rather than launch into their own violations of copyright law.

Re:Free registration by yelvington · 2003-07-14 02:48 · Score: 2, Interesting

So, regarding 1, 2, and 3 - the advantage to me as a consumer is what?

Relevancy of information. Think about it for a minute. Advertising that is not useful is noise. Advertising that is useful isn't noise. Targeting replaces the noise with utility.

If you're looking for a house, informative advertising from mortgage lenders (real lenders, not the scam artists who clog your email box) is useful. If you're hungry, you might find targeted pizza coupons from the pizza joint around the corner to be useful. And so forth.

You also might consider that making the Web site profitable ensures its survival, which ought to be an advantage to you, assuming that you care to use it.

The Paywall by ka9dgx · 2003-07-14 03:43 · Score: 2, Interesting

The fact is, they want to hide things behind a paywall, and most folks resent it. They need to decide a few things:

Are they going to expose their archives, and enjoy the benefits of far more exposure (on google, etc), or hide behind a paywall, and complain that they're increasingly irrelevant.
Are they going to start practicing journalism, or stick with their current direction of corporate propaganda for the masses? (and become increasingly irrelevant)

--Mike--

Re:Free registration by vTalon · 2003-07-14 10:31 · Score: 2, Interesting

they don't have to change the whole site; they just need to add ONE LINE of text to ONE plaintext file.

How hard is that?

First rule of web design (or programming, or anything having to do with a computer or any other complex system): that little change that you make in five minutes that you know won't affect anything -- the one you don't test because it's so minor -- that change is going to bring the whole page/program/computer/national defense system crashing down around your ears.

Adding one little line of code to every one of the myriad of pages on the New York Times website is not a small deal. It's going to involve a lot of paperwork, testing, and coding on the part of a lot of people.

It's probably simpler for Google to create a registry of "do not cache" pages on their end. And it's more their responsibility, anyway, being the ones who created the cache in the first place.

Re:There's no such thing as free registration by pyrrho · 2003-07-14 10:32 · Score: 2, Interesting

it's not free, the price is registration.

Barter did not quite die out as advertised in the 20'th century (that'd be that last bloody century).

People are confused because they don't think about the economy as barter, but it is, money just lubricates a basic system of barter.

GPL code is not free, the cost is your commitment to share your changes to the code with whomever you share a binary with.

Nytimes.com is not free, the cost is registration, so they know more about their users.

etc. etc. no-money-required != free.

--

-pyrrho

Slashdot Mirror

Web Caching: Google vs. The New York Times

45 of 518 comments (clear)