Web Caching: Google vs. The New York Times
An anonymous reader writes "The Google cache is a popular feature among karma fetishists. Many stories with links to the NY Times attract comments pointing to Google's copy of the article. This gives readers access to the content without registering. C|Net reports that Google is in talks with the NY Times to close this backdoor. The article raises some general concerns regarding the caching of webcontent. Shouldn't the NY Times simply tell Google not to cache their site?"
I'd love to see their user database, just to count the number of Mickey Mice and Elmer Fudds on there. Apart from giving the NYT your e-mail addy for spam purposes, what real point is there to free registration?
When I am king, you will be first against the wall.
The worst outcome would be a google-database which is not representative for the general web. I simply ecspect all results in google to be accessible without registering, paying or doing anything similar.
Now we can't karma whore by linking to the google cache?
IANTrolling here, but I find Google more and more useless by the day. Sometime back, I pointed out how Google seems to have a soft corner for articles and sites that affect big firms such as Microsoft.
In fact, several of Slashdot's own articles on Microsoft aren't available from Google news, although Slashdot is listed as a 'news' source. Couple of MS related Slashdot articles (on the Oregon bill - March 6th and May) have been removed, but much pro-MS content pre-dating March is still referenced.
Google seems to be aping the other Gorilla, despite all the posturing, and Microsoft's so-called attempts to categorise it as a competitor, when in fact, Google appears to be an ally!
If you keep throwing chairs, one day you'll break windows....
Shouldn't the NY Times simply tell Google not to cache their site?
You do realize that this is probably the basis of the "talks" that are going on, right? C|Net (as per usual for them and every news agency) is making a big deal of it to get themselves and their advertisers that tiny wee bit more of attention. Every little bit helps i guess.
Check http://nytimes.com/robots.txt in a week.
The reason they're trying to stop this is because with NYT reputation, they keep retracting stories all the time. With Google cache this could be problematic and the management/editors/authors could get into trouble again.
I do however dislike Google cache for many reasons. It's bad for privacy.
I think if the NY Times has a problem with then they have the right to stop google from caching the site, but I do no think it would be a wise decision on the part of the NY Times. The NY Times enjoys being a reputable news outlet and if they were to lower their readership and more imporantly not allow everyone to read their articles. It would hinder their reputation in a slight way and all slashdotters might turn to a different news outlet, which is only bad for them.
- When will slashdot stop linking to articles that require a registration?
- When will slashdot consider implementing caching for pages that, by linking to, they manage to take off the internet?
Sure, the 2nd question has been answered in the FAQ. Except it was written three years ago and Google manages this just fine. Maybe time for a second look?On the topic of site updates, has anyone noticed that 90% of the links on http://slashdot.org/code.shtml don't work any more?
Hell the link to an Avantgo version of Slashdot points to a website which has been broken for over 2 years.
Avantslash - View Slashdot cleanly on your mobile phone.
If your users (readers) need the google cache then there are system problems. Either the technical systems are inadequate (yay slashdot) or your business systems are inadequate (or both). Tech is easy, and so are business systems. Well at least the ones you want simple every-day folks to use...
--
"we live in a post-ideological world..." - Billy Bragg.
As the poster mentioned, Google already has a way to opt out of caching, so "talks" sounds like this is something different. My guess is that Google will become an affiliate of the NYT (in other words, if you hit a NYT link from Google, you're exempted from registering), and will then drop the caching.
The article talks about Google's caching of articles that have expired to the NYT archives (which you have to pay to access). What most /. folks use to link to current NYT articles are the Google partner links, which simply bypass the free registration. I'd assume these links only work as long as an article hasn't been archived yet, so the karma whores are safe; I doubt the NYT's Google partner links will be going away any time soon... ;)
DennyK
With respect to the NYT, I registered some time ago. Never received any associated spam or experienced any problems other than trying think up fake information different than all the other fake information I'd submitted elsewhere.
If registering allows the owners of a website to leverage their success by having a certain number of registered users, all the more power to them. Aside from the one-time sign-up "inconvenience," I don't see any issue, assuming the website operator is either a known entity or otherwise reputable, of course.
As for the issues related to Google's caching, I'm waiting for cached mp3's.
It's worth remembering that newspapers sometimes edit/remove articles they publish on their homepage. Without a Google cache it may be much harder to verify that a story has indeed been modified.
Don't you just hate it when promising new technology is curbed by outdated laws?
Here in Denmark we had a service similar to news.google.com for danish newspapers. The newspaper organisation sued the service for parasiting on their databases (which is prohibited in Denmark). The service was shut down half a year ago and we now don't have that kind of service anymore.
Of course newspapers should be allowed to publish their stuff without others copying it but they refused to even use a "robots.txt" (which the news service respected) to stop indexing.
If you publish your stuff on the internet and don't tell people that they should not index it, cache it or what do I know - then you better expect them to do that. Let us put those lawyers back where they belong.
The reason that the NYT just doesn't tell google not to cache them is visitors. Let's face it, even though the registration is a bitch the content on the NYT website is fairly decent. They have good articles often enough that geeks went through the effort of finding out how to read without registering. If they have google not cache them, and they close the google news loophole, then they wont appear on google news any longer. And google news is used by many more people than you think.
Hey, we get quite a few visitors from this google news. Let's change it so we get 0 visitors from it.
Duh.
The GeekNights podcast is going strong. Listen!
Leather, rubber, high heels, stockings, now we have karma fetishists that get off on Google!?!?
Bloody geeks!
the nytimes website needs google for the traffic google brings into their pages, so they can't turn away their spiders. but then, they don't want the spiders either because of copyright violations. why should this be google's problem anyway?
Shouldn't the NY Times simply stop requiring registration to read the news?
Shouldn't they concentrate on telling the truth instead?
btw, someone might want to clue in the techs at NYT about the x-noarchive that they can put into their pages to prevent google from caching, according to google's own directions.
Seems like they want all the benefits of showing up on google searches, without any cost to them. As if making their take of the news available to more people is really a cost at all.
Actually, free reg requires a valid email id. It thus filters most bogus registrations. Secondly, news sites are planning to go the 'pay' way in about a couple of years. Getting readers to register would give more accurate estimates of readership.
And lastly, once a site requires registration, even if free, Copyright ptohibits quoting entire articles on the web. This indeed could be the prime reason for this.
If you keep throwing chairs, one day you'll break windows....
The Internet Archive, which I just used minutes ago to find a handy page removed years ago, is an interesting corollary to the Google cache. I often wonder how it has survived thus long without a major lawsuit. It also reminds how crappy the web looked 5 years ago.
At any rate, cache-ing is an important force on the internet, and isn't one that should be limited in any legal way, including litigation.
Here's a Google cache of the article: ...
(just kidding)
You know you're a geek if you've ever replied to a tagline.
Like other online publishers, The New York Times charges readers to access articles on its Web site. But why pay when you can use Google instead?
Scuse me? I thought the NYT did free registrations?
And you got to love the last part of the article, where they discuss if Google's cache is in fact legal - which should have some bearing on the /.cache some of us wish for when we have taken down yet another interesting website...
Everything in the world is controlled by a small, evil group to which, unfortunately, no one you know belongs.
You are the new editor of the New York Times, the "Newspaper of Record" for the United States, if not the world. You are, of course, the new editor because the previous editor had to resign, taking the blame for an individual reporter's flagrant disregard for the awe-inspiring credibility of your institution. In the process of rebuilding your credibility, should you:
A) Insist that unaffiliated digital libraries restrict access to or simply eliminate all records of your "Newspaper of Record", or
B) Realize that maybe right about now is not particularly the best time to be saying to the world, "Please forget what we published last week."
Google News already partners with NY Times to allow NY Times articles to show up on their site. If you click a NY Times link on Google News, it shows up as a special NY Times "partner". So, on one hand, they want to get googled to generate traffic, on the other hand, they blame google for caching. Where do you draw the line?
Actually the NYT has already begun using google's NOARCHIVE option to prevent content caching. Here's an excerpt from the this morning's front page story's source:
i al/14IRAQ.html positions: Top5,Middle1,Right3,Middle5,Right,Travel7,Travel11 ,Bottom1A,Bottom3A,Right5,Right6,Right7,Right8,Bot tom8,Bottom7,Inv1,Inv2,Inv3,Frame4,Right4 kwds: politics+and+government;international+relations;ir aq;suggested%5ftopnews;suggested%5finternational;s uggested%5fworldspecial;suggested%5fmiddleeast --
!-- ADX SETUP: page: www.nytimes.com/yr/mo/day/international/worldspec
meta name="ROBOTS" content="NOARCHIVE"
Kind of makes me wonder what's the point of the story, since it even says there's an easy way for concerned parties to opt out of the cache.
Gradually, Google's built up a 'good guy' image, and now looks like they're going for the kill. Already Google seems to be the only search site around, and they censor and distort like mad.
Consult the word: Googlewash, and you'll find a lot of info on the referenced article from The Register (it's available now, earlier this was censored). Incidentally, the affected article was a NYT OpEd piece!
If you keep throwing chairs, one day you'll break windows....
In case the cnet is /.'tted, here's link to Google cached page.
More than mere navel gazing.
So do we charge the NYT $1,000 to explain robots.txt or $10,000 because they are so stupid...
- Adam L. Beberg - The Cosm Project - http://www.mithral.com/
Well, I guess that NYT (and many others) allowing Google News to login and index their content means that they like them doing that for getting traffic. For whatever reason, NYT wants you to register and they have a right to as well as they have copyright, allowing Google to put in the snippet, but not the whole article without their consent.
And that is the reason for an index, to find the original.
It is good to see they are working this out together, though, without NYT going to court as the first step. This is a far better way than the popular shoot-first-ask-questions-later attitide most media companies have...
That's the thing - it's not free depending on your definition. By my own definition, you're giving them valuable information, and they get to keep it and use it as they will, including spamming if they feel like it (or spam from any company which buys them out, they sell it to if they're feeling bankrupt, etc). It's practically misadvertising of a service, but it's accepted now, so everyone gets away with it.
If it really were free, why would you need to register in the first place?
Personally speaking, and I'm sure a lot of those who post here do the same, I have a free webmail account that I use to catch the spam associated with all the registration sites, from Ezboard to the NYT.
An infinite number of monkeys will eventually come up with the complete works of
How does Google manage to cache a page which requires free registration anyway? Are the crawlers that smart?
Trying to sell web pages is like attempting to sell mp3s on a p2p services where all mp3s are free.
IT wont work. Instead you should use your websites to market and sell your magazine subscriptions.
Like Wired.
If you use Linux, please help development of Autopac
I think newspapers expect their archives to be real revenue generators in the future. ISTR journalists/columnists getting annoyed a few years ago when these archives started to appear, as they weren't getting paid any extra money for having their work effectively republished, but I suppose any such legal arguments have been resolved one way or another by now.
Google should not index anything that is not publically available. When I get a search result I expect to be able to read that search result or it's worthless (and I dont really care if registration is free or not).
If the NYT cares so much about who can read their site they should block google and ask the NSA to archive it instead...
In case google gets /.ed here is it's google cache
w ww.google.com/webmasters/3.html+%22I+need+my+site+ information+changed.%22&hl=de&ie=UTF-8
http://www.google.at/search?q=cache:qw2H0d95VkEJ:
--
Karma 50, and all I got was this lousy T-Shirt.
Brand recognition is not always a good thing. When I think NY times I think "that annoying registration website". They are free to do what they want, but it leaves me cold.
My Karma: ran over your Dogma
StrawberryFrog
Here we have the NYT, one of the premier news organizations in the world, offering its articles for free on the same day that they are published. Yet a large number of people, of this online community at least, refuses to provide even a minimal amount of information (and no money) so that the newspaper can try to make its online presence profitable.
I think the spam fears are a red herring, I've been registered with the times for over 2 years. I've never gotten spam that I think is traceable from them. I get a daily email of the day's headlines (and with the click of a box I could discontinue this).
Why should the RIAA change its business model to a pennies per song method when there is such a blatant example of the online community refusing to go directly to the source for even free material?
Want to improve your Karma? Instead of "Post Anonymously", try the "Post Humously" option.
I like how the NY Times has to opt out of caching, surely caching should be opt IN ?
the facts are a commercial company A (google) are making a profit from unauthorised copying of other peoples content without permission , meaning company B (you) has to spend money (webmaster) or take proactive steps to remove your content from their databases, google are not an ISP or a goverment agency so really they have no buisness in taking without asking other peoples content.
if this was M$ people would be crying bloody murder but because google are "cool" they can copy content at will without ramification and sell it (via selling advertising) ?
i think its only a matter of time before google get into trouble, after all copying content is an infringment in most countries , so why should google be allowed to profit from it ?
now if it was a case of "put this code in your site to opt IN to the cache" it would be a different story but as it stands google are steal/profit first and worry about the legal stuff later which is a risky position to take.
cheers
Actually, registration is not required to protect a work. Creating a work automatically protects it under copyright law -- no need for registration, user fees, or that little (c) thingy. At least in countries respecting the Berne Convention.
The Mongrel Dogs Who Teach
At least this story is from C-Net. If it were from the NYT with a byline by Jayson Blaire (sandwiched between his stories about political upheaval in Grand Fenwick and his new biography of Thomas Crapper), we might have to wonder.
Don't blame Durga. I voted for Centauri.
be nice if you could specify something like:
that's two examples, one or the other would be nice.
US Citizen living abroad? Register to vote!
"Yet a large number of people, of this online community at least, refuses to provide even a minimal amount of information (and no money) so that the newspaper can try to make its online presence profitable"
There are scads of newspapers that offer online services for free without the hassle having to force you to put in "Elmer Fudd at zip code 90210" registration information.
The whole registration thing has to go: it is just one example of now the NYT just doesn't cut it.
Slashdot decided never to cache a site themself(see the faq). As a result form this many sites have died in the process of being /.ed.
/. cache the articles? Too much legal work i suppose. Why does google get aways with this? They took the legal work?
Why doesnt
"because google are "cool""
You is saying that just to looks smart, is you?
www.majcher.com/nytview.html
Use it frequently and often!
Currently I see "Welcome, paohjjkmtpfd."
At Washingtonpost.com, they only want gender, year of birth, Zip or Country. Pick most randomly, but always use 1984 for birth year.
More and more archives are already going pay per view."I do not support the terrorist regime of USA [troed.se]"
The US is very anti-terrorist. If you don't support it, perhaps you are a terrorist yourself?
Sweden itself has a bad reputation, with Olav Palme supporting genocide in Southeast Asia.
Registration has nothing to do with copyright, either strengthening or weakening it. What law is this? Please provide a citation for this claim.
"And you wonder why you get ads that have absolutely no interest for you? And why advertisers have to shout lounder and louder to get through a mass of untargeted ads?""
What ads? I ignore or block such ads out of principle. Maybe if they provided ads for something worthwhile (instead of "shock the monkey" deceptive scam links), they would not be ignored. Maybe instead of shouting gibberish louder and louder, they should provide good ads for worthwhile products and services.
Hardcopy newspapers can't be erased or amended to suit whatever powerful interests might be embarassed by the truth.
Web-based publications may not be immune to such protection if they are archived by one source.
To not allow independant caching of news is just another step closer to historical revisions and distortions.
I'm not trying to say that such a thing is inevitable, but it would make things a great deal easier for those who would be inclined to manipulate the public.
Dupes, retractions, poor editing... is /. owned by the Times?
I've never been sent a single spam from the NYT. The reason they want this is for demograpics. A) it tells them who their web readers are, and B) it tells their advertizers who their web readers are. And it also allows them to show ads for products people would be most intrested in.
autopr0n is like, down and stuff.
Does anyone have a free NYT registration and password which has been given out and shared a lot?
Wired the magazine and wired the website are totaly seperate companies. The website is owned by Lycos, and the magazine by Conde Nast.
autopr0n is like, down and stuff.
Setup a disposable e-mail account and use that.
" i think the "point" is google are copying content first without permission and leaving the site owner to do the work of removing/robots.txt"
Any good search engine should ignore "robots.txt". If you don't want it read, don't put it up on the web in the first place.
"Google is making copies of all the Web sites they index and they're not asking permission,""
So sue every user with Netscape, MSIE, and Opera: they are copying web content into their own caches all the time.
Anyone else see the irony that big buisness feels that "Opt-Out" is a fair policy when advertising to thier customers by phone and Spam. But when google gives them an easy and accesable way to opt-out of thier caching system by use of robot.txt and the NOARCHIVE meta-tag that isn't enough for them and they feel opt-in is the only way to go.
iRepairIT - iPhone, Mac, & PC Repair
I can go to my library and read the NYT without registering. What's the big deal?
It's a good thing, too, since in post-USA-PATRIOT Act America, what you read can and will be used against you in a secret military tribunal.
You see? You see? Your stupid minds! Stupid! Stupid!
Actually, registration is not required to protect a work.
True, but what do you mean by "protect"?
In the USA, you cannot sue for statutory damages and legal fees if your work is not registered with the government.
Theoretically, you can take someone to court for copyright infringement, but what this will mean in terms of money is another thing entirely, and so your "protection" is meaningless unless you either have the up front bucks to stop an infringer/defend your IP, or have registered with the government (ie payed the copyright "protection money") and find a lawyer to fight your case on a contingency basis.
I like google's caching quite a lot. I use it almost exclusively these days before visitting the actual page (if I even get that far). Using Google's cached link has the advantage of:
:)
1) Speed... Google's cache is fast. If there's one thing that annoys the heck out of me, then its websites that take more than 5 seconds to load. This is quite annoying when its caused by javascripts, slow servers or popup ads when Google can serve me effectively the same page in under a second -- especially when I'm not even sure if it is the right page, the one I'm looking for.
2) Nice highlighting so I can quickly page down to whatever I was looking for (now if only Google blocked those Tripod background pictures which makes their cached pages unreadable..) Sometimes I wish Google made their highlight examples at the top clickable so it jumped to the first appearance of the keyword immediately.
3) Using Google's cached links usually blocks silly popups and other annoying stuff too many websites seem to incorporate these days.
Perhaps I'll make a proxy server which browses the web exlusively using Google's caching... word highlighting on all pages, fast browsing everywhere and working links to more cached pages... should work fine for any webpages below 100kB
As for the NY Times being annoyed with Google's cache, they can easily fix that themselves. Either that or Google's spiders are a lot smarter than I thought to automatically register themselves for the NY times. Furthermore, as far as I'm concerned everything that's publicly accessible on the web without some form of password protection (which would of course also block robots) should be cachable and archivable in whatever form you see fit. Respecting robots.txt is no more than a courtesy as far as I'm concerned. If you don't want your pages to be archived or cached or whatever, then by all means protect your page, or donot put up a webpage in the first place (I'm sure a thousand others will leap at the chance to fill the void).
--Swilver
well cant they just use meta tags to prevent archving of their pages
<META NAME="robots" CONTENT="noarchive">
from
http://www.google.co m/bot.html"
I wonder where this will stop. NYTimes might get google to stop caching the direct link for a certain article. That is fine. But it is just one more step to do a search in google for the article with a few keywords from the article. If any person has been good enough to save it in a personal page, discussion board (like traditionally done for articles likely to be slashdotted) or any other place, the google results will show it. Would NYTimes now want to restrict google from showing thses pages because of the copyrighted stuff. You will be amazed as to how many articles I find this way. Many of them are just excerpts but others are complete.
Another thing on a tangent was that I really do hate the fact that information is restricted for just one fundamental reason - if it is not commonly available then it cannot be linked to in most of my writings for they are going to be unavailable to the party that I am writing to. This is especially true if the writing is not immediate but is meant to be read a month or two later. This is also relevant to Bloggers who might make comments and refer to a link, only to have the links go dead because the content is space,time, or space-time restricted. I am willing to pay for reading the articles, but before I can write about them I need to ensure that they are going to be available to my common readers. And as in the Blogging or P2P scenario I am not sure if one person is going to read my writing or thousands so buying a license for them is illogical. And then, if they need to send it further, are they also supposed to pay ??? Basically, for me to be able to write, to build upon existing work, to look ahead standing on the shoulder's of the giants, I need to be able to pass on the information. I am adding value because I am couching that content in a context, but until I can freely share the underlying articles too, my product is stunted. I can reach narrow audience but can't reach the common All this is very good in developing software where you might negotiate a deal once in a while to include someone's underlying code, but not writing where you might be writing 10-15 articles a week ...
To see a world in a grain of sand, and then to step back and see the beach where the sand lies
The NYT is a local / regional paper when you get right down to it.
The New York Times wants Google to continue ranking their stories but they want Google to do them the special favor of only pointing to their registration page:
"We are working with Google to fix that problem--we're going to close it so when you click on a link it will take you to a registration page," said Christine Mohan, a spokeswoman at New York Times Digital,
If I were Google, I'd tell them such advertising services would cost them a great deal of money. That or simply drop the New York Times right into the bit bucket. It will cost Google programing time to make it happen and computing time to keep it going. If every site on the web required this kind of custom treatment, Google's task would be much more difficult and it might be easier for them to drop it.
Droping the NYT from Google is fine by me. People who don't understand the implications of digital publishing don't deserve readership. If they won't let librarians make digital coppies, libraries should drop them too. What's next, the New York Times sends cease and dissist orders to everyone who runs a proxy? It's like the NYT is trying to make their digital publication harder to share than their paper one was. A paper copy can be shared by an entire office and that's what a proxy does. A paper copy can be indexed and archived by a librarian, and Google did not even do that much. One day the paper version won't be available. If librarians can't keep their own coppies of the digital version for verification, the publication will have no credibility. If the New York Times wants to continue charging advertisers for eyballs, they had better remember that their credibility is bassed in part on widespread availability.
Friends don't help friends install M$ junk.
Is there any easy (spam isn't such a problem for me - touch wood - that I'm willing to spend ages looking into where it comes from) way of telling where this stuff originates from?
"It's a good thing, too, since in post-USA-PATRIOT Act America, what you read can and will be used against you in a secret military tribunal."
You must be a member of al Quada, because only guys like that have to worry about this at all.
When I think NYT, I think "Lying, racist, plagarist reporters".
And then I look elsewhere for my news. Interesting that their registration system started requiring email verification right about the time their collective journalistic reputation went down the toilet.
"Oh my God. This is terrible. This is the end of my Presidency. I'm fucked."; ~ Donald J. Trump
Is there some problem with readers, with editors, hell, with story submitters, actually reading the damn article before making snide speculations?
"Wwe're going to [fix] it so when you click on a link it will take you to a registration page," said Christine Mohan, a spokeswoman at New York Times Digital, the publisher of NYTimes.com.
That's why they don't just tell google to not cache. They want the links to appear, but not to the stories themselves.
How about we discuss that issue, rather than some other, theoretical issue? I know it's an alien concept, but let's give it a try.
Here, I'll start it off. It looks like a decent idea. Google still gets the links, the NYT still gets the traffic, everyone gets to find the articles they want. What's not to like?
If you were blocking sigs, you wouldn't have to read this.
If the information is being copied and circumventing the NYT's usual requirements for access, then this is not the NYT's problem, it's Google's. A good question might be how Google's robots can actually circumvent that access in the first place, but I'm sure someone's thought of that somewhere I haven't noticed yet...
OTOH, Google is quite at liberty not to list the NYT in its results if it so wishes, which presumably wouldn't be the outcome the NYT would be hoping for (and would presumably get if employing robots.txt).
The moral onus here is clearly on Google to ensure that if they are changing the way information is presented then they do so in a manner acceptable to the provider of that information. Or did you expect the NYT to contact anyone in the world who might be interested in caching their site? The "we don't need any legal recourse" argument is pretty weak too; it basically assumes that everyone in the world (a) knows about and (b) obeys robots.txt, which is clearly nothing close to the correct.
All in all, if both companies are looking for a constructive solution to this problem that benefits all concerned, it seems pretty sensible for them to get around the table, discuss what they want to happen, and make it so.
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
The technology has changed the way that things work but the law has not kept up with it. To start with, we continue to talk about "copyright". Controlling copying of information makes sense when the distribution mechanism is trucks moving bales of paper around. Once you start sending bits around, everything is copied. From the article:
And technically, any time a Web surfer visits a site, that visit could be interpreted as a copyright violation, because the page is temporarily cached in the user's computer memory.
When you have the newspaper delivered to your door, the content basically comes for free (the cost of a newspaper doesn't pay for much more than printing and handling). However, you get to keep the content as long as you like, chop it into bits and what not. Libraries have archives of newspapers going back years and you get to see them for free. What's the right mechanism as we move forward? The "pay per view" model that content providers want to shove down our throats courtesy of the DMCA is not pretty and when it starts to affect the average Joe I suspect it will be booed out of favor pretty quickly. But what is the right mechanism to make sure content providers get paid something and that we, the citizens, get something for our money?
Allows them to track your reading habits and if you seem interested in articles on flying lessons, high buildings, presidential movements and bathtub biochemistry a flashy light goes off somewhere.
The NYTimes has the same error rate as any organization. Yes, Jayson Blair was a bad apple. Yes too, the NYTimes does a much better job than most organizations in corrections and reporting their own errors and trying to learn and move on. /. contrasts terribly in this regard.
AND YOU DO TO, DON'T YOU READER?
The parent post was not insightful, just flamebait.
Err. Where?
As this website shows,
Google is not affiliated with the authors of this page nor responsible for its content.
...is that sometimes it's used in new ways.
For example, I have contributed thousands of posts to Usenet over the past decade. When I first started, pretty much no-one had even heard of the x-no-archive header. Even today, few mainstream newsreaders support it readily. Thus no-one bothered to set it.
Skip forward a few years, and now Deja News is offering an archive of all my old posts, along with everyone else's. You have a commercial organisation making money by using my words without informing me, nor with my permission. That, in itself, is a dicey proposition, although at least there's a "public interest" argument in its favour. The buy-out of the database by Google, now blatantly selling my words without my permission, is clearly over the line, though.
Now, personally, the principle of this annoys me, but the actual content being available doesn't bother me, because I'm always careful not to write anything I wouldn't want someone to read in future in a Usenet post. I can see why it would legitimately bother others, though, particularly if they expected their posts to expire and disappear a week or two after they were written, and not to show up on a potential employer's report six years later. Claiming that people should have seen this coming and put "x-no-archive: yes" on everything they posted a decade ago is simply unrealistic.
You can make a similar argument for web caches now. Until a year or two ago, with things like the Wayback Machine and Google Cache coming up, a search engine was just a search engine, and always linked to your site. The need to use robots.txt to protect your material simply didn't exist on the same scale then.
I think there is every reason to have lawyers involved here, because at the moment, the law lags behind the technology. You can't copy others' material in other media without adhering to certain rules and regulations, and new evolutions in the way the Internet is being used may require new legislation to prevent abuses of rights that were previously unchallenged. Given the number of ill-formed and illogical arguments made by many posters here, typically those who want everything to be free to them even though they've done nothing to earn it, what other solution would you suggest?
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
Actually, free reg requires a valid email id. It thus filters most bogus registrations
I know what you mean. For a while, my 'valid' email ID was 'root@nytimes.com,' but they eventually caught up to me. Now it's 'sales@nytimes.com.' And if you think there is any legitimate information in my registration, then you would be in error.
What those who want activist courts fear is rule by the people.
Wrong!
As many others have emphasised, it is easy to turn of the Google cache for whatever pages you wish. But, in the case of the NYT, there is a further factor. They must have special code within their system to recognise the google spider and allow it access without registration. Either that, or there is some other prior agreement allowing access. Given that, they can scarcely claim extra work to support Google. I believe the whole thing is mainly to get some free publicity for their site. I suppose the other possibility is that they want the page accessible from Google News but not the regular search engine cache.
The NYT needs to call off the lawyers and seriously think about how they brought this on themselves.
There are so many models for running a news site that avoid this problem (Salon) that calling out the lawyers is just childish and inapropriate. If a site wants to be indexed by a search engine, then they should be aware of what that means, and if they don't like how a particular search engine functions, then they should take measures to change thier own site to prevent what they don't want indexed, or cached, from being accessed.
I know that finding pages on google that I cannot access would be infuriating, and I hope that Google realizes that many of thier users would agree.
Read, L
So you, and others here, have said. And yet, people keep linking to the NYT stories. Why's that?
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
Google is one of the few complex services on the web that is almost always relevant when one tries to use it. The Google cache is one great feature. If they manage to unnecessarily gut that, I wonder what other features they will find to complain about next.
It's absolutely the most annoying thing that I can think of.
... If so, they may consider getting rid of it.
If Google closes its backdoor, I will simply never read another New York Times article again.
I hope that other people feel the same way
I am one of (the not so) few that actually gave them a real email address. I have not received one piece of spam from them. That said, I doubt addresses are collected for spamming purposes. For something else down the line, maybe. Email based subscription delivery? It's been done elsewhere.
*shrug*
[Set Cain on fire and steal his lute.]
Just check the aforemetioned link, they block the 2004 and 2005 a bit soon, isn't it?
... :-)
And now, ladies and gentlemen, please begin your conspiracy theories about those prewritten news articles and so on
I don't know how valid the address needs to be. I just used the "from" address on a piece of SPAM I had been sent previously. It seemed to work; certainly it didn't e-mail an auto-generated password to that address or anything.
Je fume. Tu fumes. Nous fûmes!
it makes it more difficult for the evile wons to change their phonIE fauxking billyonerror 'storIEs', every time they are caught bullowing ?pr? smoke/execrable up yOUR .asps again.
lookout bullow.
The banner ad at the top of the Slashdot home page when I posted this reply was for the New York Times...
And if they require a phone number, use 867-5309 # ask for Jenny
Need a Linux consultant in New Orleans?
Yeah, and the NYT will probably want to patent NOARCHIVE (i.e. technology to prevent unwanted access to a site by spiders)... Makes me wanna puke...
how long until
Oh please. Kindly refrain from sharing your paranoid delusions with us.
Has anyone else but me noticed that if you click a link off from http://news.google.com that points to a NYT article, that you do not have to log in to read it? Google and the NYT must have some sort of arrangement regarding that. I don't see what the problem with having their pages cached is anyway. They must have special provision to allow google to read the articles without logging in anyway or they would not get crawled.. Anybody try browsing the NYT with Googlebot as their user agent string? I still had to log in. Probably the actual user agent string that googlebot uses has some numbers etc after it. I still bet that if you used the actual user agent string as the googlebot, that you would find that you did not have to log in to view stuff that you did before. I ought to try that whilst browsing for pr0n. I bet you wouldn't have to log in to some pay sites.
Really? Where do I find out about this. If this is true (I'm sceptical...) then I guess it's lucky I stick mostly with BBC News for my news online...
If the NYT is showing google one page (without needing them to login), and the users another (requiring them to login), then that's cloaking. Really, google should be the one to remove their pages, as anyone who's caught cloaking is normally banned.
SearchIRC - Now with live chat directory!
One would think, that if the content of the NYTimes is copyrighted, that the would, in fact, have the right to tell Goggle to just stop caching their web pages. Particularly if the web content is the same as what they've put into print. NYTimes could argue that the google cache is a violation of their copyright to the printed paper.
Awk! Pieces of eight. Pieces of eight. Pieces of seven... ERROR: General Protection Fault. [Paroty Error.]
If you think that is idiocy then you've not been keeping up with current US govt practice, policy and law have you?
While that example might not happen, in a country where librarians are expected to report reading habits it is not out of the bounds of reality to see it getting to that stage. Not likely but not entirely improbable or impossible.
...then you should have taken appropriate measures to ensure it could not be cached before you opted to publish it to the whole world.
I visit many sites which require logins, I can never remember them so I just create a new one each time if I desperately want access to some information, If I'm not that bothered of if it's a commerce site, I just go elsewhere.
Government of the people, by corporate executives, for corporate profits.
I mean, I never ever fill in the truth to online marketing forms. Why would I? It's much more fun to put other information in.
Hands up anyone who does fill in forms with the truth!
Government of the people, by corporate executives, for corporate profits.
Actually, that's a pointless point. Of course google doesn't produce anything; they are a meta data service. Search engines and collators for websites, for news for images and who knows what else.
The issue is whether or not they should be able to collate data that is in some way secured. And on that I'm offering no opinion mainly because I can see all sides of this and hats are all too grey to be able to distinguish for me.
I was looking for a better way to make sure I didn't read any NYT Times articles, so this will help alot, guess it is time form more OpinionJournal .
Onward to the Aether Sphere!
Secondly, news sites are planning to go the 'pay' way in about a couple of years.
Good! I hope they do. That would open the door for independent and alternative news organizations... unless the man gobbles them up or silences them another way.
LilMikey.com... I'll stop doing it when you sto
"Big business is taking the internet away from its free beginnings. Look for an association like the RIAA to form for news/content site to act as the legal executioner of sites posting copyright materials."
So that's what the whole internet thing is about! How dare those companies whithhold the content they created from us! How dare they!
BIG CLUE:Free speech as in you can run your mouth, and no one can stop you. Not everyone elses speech free for you.[1]
[1]Your posts fall under copyright. However you give implicit permission for others to read it free of charge by posting here. You can if you want to try closing it off and charging for it.
Note that the pages are indexed, but google users can't accessed cached data
Appears they're using the NOARCHIVE meta tag
Is it big news whan the NYT has to call Google for tech support?
They can't just stop google caching the pages because then google would stop putting their news in news.google.com.
Personally I can't understand the problem, you don't have to register to view articles through news.google.com anyway.
I wouldnt be annyoying if you register once, then forget about it. I registered years ago, so I never have to see the reg page again. Took about two minutes, and well worth the effort for a "free" subscription to the NY Times. Have you seen the application form for PC Week and similar magazines? THOSE are a pain; dozens of questions to fill out, including things like income and potentially confidential info about your company. If you are that concerned about your private info (as has already been stated) just use a fake name.
I am not worried about being tracked, but rather don't find the content of the NY Times compelling enough to bother acquiring one more username and password. On the other hand, I've registered with slashdot for the amusement of karma and to my.yahoo for the spamcatcher email account and personalized weather.
;-)
I don't complaint to slashdot, but did email NY Times and tell them such. (They graciously offered to sell me a paper subscription, no email registration required.
I also don't avoid no-registration links or slashdot posts that contain copies of NY Times articles. I guess that makes me a hypocrit.
"Rub her feet." -- L.L.
There is. How about the Creative Commons?
At least here in Sweden, if you say something is "free", then you are expressly prohibited from requiring anything in return - be it any action, information, or goods.
Sounds like obvious common sense, but the lack of this legislation (and mindset) in the US has started to slowly creep over here. Fortunately, companies get a quick and cold wake-up when they translate their American ads saying "Free router! (with purchase of X and Y)" and run them here, and instantly become required to hand over routers unconditionally to anybody who asks for them - the condition isn't valid since you are using the magic word "free".
Requiring registration is, by definiton, not free. You are required to do something in return.
"Basically all I am saying is that there should be a movement similar to Open Source not only for software products, but for journalistic content."
Hehe Everyone wants to be a Gutenburg, anyway...
1-School newspapers and magazines.
2-Those free papers, and magazines you can pick up around town.
3-PBS and NPR.
4-My scratchings on bathroom walls.
Anyway you can lobby for laws to prevent them from closing up content they either bought, or produced themselves. However remember taking away other people's rights means yours are next in line for the same process. Glad you're willing to make the sacrifice.
So which is the real real world? The one where you spend the afternoon on your porch reading a book to your mate, or the one where you sit in front of a television and "reap the rewards" of advertising, so you can buy more stuff, presumably?
I am not saying my world is universally better than your world, but it is just as real.
V
" 'Why should the RIAA change its business model to a pennies per song method'
Because nothing else is going to work. If they refuse, they'll die."
And yet the proponents make this argument without proof.
Sounds like wishful thinking to me.
"USA has a bad reputation worldwide"
Not among people who are informed about things.
Every time a cached link is clicked, pay sites like the New York Times can receive notice from Google (easy to automate this) that one of their pages (which is cached in Google) has been accessed, and all advertisements in the cache have been displayed (Google caches Ads in the page as well as the contents). This allows the website to "offload" traffic and at the same time keeping the books on the number of times their Ads have been viewed so that they can send the accounting record to their paid Advertisers.
Google would find this very simple to implement, and paid sites would find this very beneficial (borrowing Google's enormous bandwidth and server capabilities for free) and at the same time should solve most of their concerns. After all, Google's cache isn't sufficient for proper access to ALL the paid-content at the New York Times as the cache is temporary in nature. Also, its too spotty in coverage to be considered reliable enough for really digging into a paid-sites entire content.
Using Google like this is akin to using Google as a window into the pay-site's house of content. You can part of a room, but not the whole interior. Now, every time someone peeks, the House gets notified and can get paid for it. The more windows Google adds to the House, the more chances the House gets paid.
Why doesn't Slashodt cache news articles and stories before running a story? It would make a lot of sense for text based news items.
I use Google news pretty regularly, and I've noticed that some of their links are to paid subscription sites. These are clearly marked as such ("subscription").
;=)
I don't generally click on those links, but I think it's a good idea, since I'm not actually going to Google for the news, rather for links to the news. The reason I personally don't click on the subscription links is that I have my favorite set of real newspaper sites (some registration, some free, some not) and that's not what I'm using Google News to find. Someone else, however, probably is using it that way.
I would guess that Google gets something back from that sort of link, since the site owner is getting more from the link than Google is from the listing. (Maybe I'm wrong, of course.)
It makes perfect sense to have something like that for the regular search engine, and to charge for it, as long as it doesn't affect the link's rank in the search results.
For example they could have a special command for robots.txt (or google.txt maybe) that would allow Google to access and cache the page, but the regular link would go to some registration page (easy to do) *and* the cache link would also go to some kind of registration page, defined in the google.txt file.
The NYT would promise that the cached page is really the cached page, and pay Google something for redirecting to NYT's cache (with registration). Or even better, there would be some kind of redirect where I actually get the cache from Google after I've registered with NYT.
They're probably thinking of something like that, because otherwise the solution would be to simply disallow caching, and that wouldn't be news, would it?
This Like That - fun with words!
The question is framed very narrowly by Slashdot, so this discussion misses the larger issues. The cache copy is an issue in Google's main index for many webmasters. The Google News situation is a subset of a larger problem; the cached link doesn't exist in Google News. Google News is a much narrower issue. I'd like to bring up the issue of full-text caching done by Google in their main index.
.txt files, there's no place to insert a "noarchive" and Google goes ahead and caches it anyway.
My problem with the cache is that it gives Google a competitive advantage that is unfair, and furthers their monopoly. This is especially unfair since it is most likely illegal -- assuming that you could ever get a good test case into court, or get a class action lawsuit going by some webmasters, publishers, or search engines.
To add to the attractiveness of the cache copy, consider what Google has done:
1) The cache copy makes it possible to highlight the search terms, whether or not you have the toolbar installed.
2) The download time for the cache copy from Google's servers is always faster than from the original website.
3) You never get a 404 "not found" or a DNS lookup failure for the cache copy.
4) The link to the page recommended by Google for bookmarking at the top of the cache copy is a link to Google's copy, not to the original page.
5) How about all that Google branding on the top of the cache copy? Priceless. I feel the cache should be opt-in, not opt-out. The only way you can avoid it right now is to place a "noarchive" meta on every page in your site. On some file types, such as
The cache copy tends to keep eyeballs on google.com, and increases their searches. You may have noticed that many major news sites won't link to other websites in their stories anymore, but rather just mention the relevant site without putting a link behind it. That's because they don't want eyeballs wandering off of their page. A wandering eyeball may not come back and look at more ads. That's basically one of the big reasons behind the cache copy as well -- it keeps eyeballs from wandering as much as they would without the cache.
All the Google partners -- AOL, Earthlink, Yahoo, Netscape -- don't include the cache links, and I assume that this is the reason. They don't want people wandering off to Google and staying there.
As new competition is organizing to challenge Google's monopoly, from places such as Overture (Alltheweb and AltaVista), Yahoo (Inktomi), AskJeeves/Teoma and Microsoft, these engines have to consider whether to fight Google on the cache copy, or offer their own cache copy even if they think it is illegal. There isn't really any middle ground on this.
Many observers with legal expertise feel that while the snippets are "fair use" of a website's content, offering the full text in a cache version is not. Copyright law requires "express permission," but Google only offers an incomplete and inconvenient opt-out. I suspect that the legal departments of these other engines are more inclined to challenge Google rather than launch into their own violations of copyright law.
Google seems to be the only search site around, and they censor and distort like mad.
Dude. Put your tinfoil hat back on, take your meds, and go sit in the corner until they take effect.
Not exactly news. Both are part of our state-controlled media.
Did you know you need a security clearance to work for Google?
That their Usenet policy allows them to drop any posts from other ISPs that they don't like?
BeOS Stock Scandal with Microsoft Brewing
Your comment was confusing to me until I realized that you are talking about giving NYT an actual email address. Why would you do that? Isn't that why we have hotmail.com? Give an address that does not exist or a throw-away address.
Last week I was registering at a web site and I put in xx@xx.com for the address. The system responded, "This address has already been registered." So then I put in xxx@xxx.com. The system responded, "This address has already been registered." So I entered xxxx@xxxx.com. Same response. Finally I awoke fully and entered some Ds, xxxxdd@xxxx.com, and the system accepted my "registration".
But you know, you DON'T have to give a real name or email for NYT or JPost, or most of the others, they don't send you your pass and uid, you know.
It's not the spam that's the problem, if you use your head, you get no spam. It's the hassle of logging on
The problem is, Sir, things cost money to produce. Newspapers cost a hell of a lot more than 50 cents or a buck fifty or whatever. Want to ACTUALLY pay for the content you read? Try $5 or $10 per issue! Can't handle that much? Put up with the ads.
Actually, free reg requires a valid email id. It thus filters most bogus registrations.
Crap. I am jizz_sucker, age 97, and my email is qwert@qwert.com. Email me and see if I respond.
Before that I was quwambi_bartok at an email made of random key strokes.
Hmm? I generally do not read newspapers. Sorry about the feint there, I was just hypothesizing about a dream advertising assimilation engine that erased itself, one of those things most people refer to as a joke. I don't actually read the Times, or any other major newspaper.
V
jkrise wrote: ...Secondly, news sites are planning to go the 'pay' way in about a couple of years...
Anonymous coward replied:
Can't wait for that day, I'll subscribe and post all the news on my site for FREE! And I'll be the only one, generating millions from ad revenue, because I'm the only free news web site in the world.
While for the time being, Google is THE search resource, it was not always this way. Google will fall under it's own weight and some other wizz-bang will take the lead. Remember AltaVista? Excite? HotBot? Where are they now? They are all where Google is going fast. Who will spring for the next big search site? Some soon to be rich college kid who will eventually sell out and go play with is trophy wife and boat...
--Mike--
I can't say the same about the NYT.
~Berj
I have an NYT account. Do I care if they know what I read on their site? About as much as I care when the next "American Idol" rerun is on (which is to say, not at all.) Why on earth are you fuckers so paranoid about this? I see absolutely nothing wrong with tracking as long as it's limited to the originating site.
Get over it, for God's sake.
- A.P.
"Remember when the U.S. had a drug problem, and then we declared a War On Drugs, and now you can't buy drugs anymore?"
I've been registered with the NYT from their beginning. I receive little spam from any source, and none that I can trace to them. The only mail I get from the NYT is mail I requested.
-- Slashdot: When Public Access TV Says "No"
That's taking comparisons of Slashdot and NYT about as far as they can go without demeaning the NYT.
The Times practices real journalism; Slashdot practices poaching on real journalism.
-- Slashdot: When Public Access TV Says "No"
Christ all fucking mighty! It's an EMAIL ADDRESS! If you're SO concerned about it, MAKE ANOTHER ONE and use it for registration! What the hell is the big deal; why is everyone SO pissy about the NY Times for offering a FREE service (and yes, unless you're a fucking retard, you should consider it to be free)? An email address is not currency. It has no value. It is worth NOTHING. I can make billions upon billions of them whenever I want. Why the bloody hell do so many people on slashdot take offense to this absolutely innocuous information request?
Jesus!
- A.P.
"Remember when the U.S. had a drug problem, and then we declared a War On Drugs, and now you can't buy drugs anymore?"
Why is NY Times even in "discussions" for this, other than to gain some column inches?
I don't think this is the case. For one thing, it doesn't seem like NYT is playing up this story the way I would expect, if they were using it as a publicity tool.
My guess is that among other things, NYT and Google are discussing what kinds of demographic info Google could provide NYT on those who access the cache, and what the cost of that kind of contract would be. I'm sure that NYT would like to be able to tell their advertisers that "If you buy the 'Geek' package, your ad will be put directly in front of 750k geeks with subscriptions, and on the stories that get to slashdot, you can expect another 250k tinfoilhat viewers, through the Google archive."
So I think Google and NYT are looking for win-win combinations. Since this is a brand-new playing field, it is probably going to take a while to figure out. I'd guess they are even having to work out how to keep score.
Requiring registration is, by definiton, not free. You are required to do something in return.
I agree; that's why I hired someone to register for me, thus making it completely free.
people are using technology for what it was inteded to do!
At the heart of Google's caching dilemma lies a thorny legal problem involving a core Web technology: When is it acceptable to copy someone else's Web page, even temporarily?
When your server and pages say it's alright (or don't say that it's not alright.) The standards for the web are very clear on this, but non techie companies (and some judges) don't seem to get this.
This reminds me of the issues of "deep linking" that everybody was suing over a couple of years ago. That's exactly what the web was designed to do, but these johnny-come-lately companies put sites up, and expect people to stop using the technology for what it was designed for.
If only the EFF was as well funded as the ACLU...
I've been subbed to the NYT for a long time, with a trackable email address used only for the registration. I opted out of all of the email crap.
I've never been spammed at that particular address. I would conclude that the NYT is actually ethical wrt their email database.
I forget what 8 was for.
Oh you don't like it? Ofcourse we won't cache your pages then, we'll even make sure your pages are completely removed from our index. From now on a search on New York Times will give nothing but gay porn. If there's anything else we can do for you don't hesitate to ask. Thank you!
>Shouldn't the NY Times simply tell Google not to cache their site?
No, the NY Times should stop crying and apply some technical knowhow.
2 solutions:
There is this little thing called a robots.txt file. NYT should learn how to use it.
There is a little thing called the login. NYT should properly secure it to keep spiders out.
Oh, that's right, they want page rankings with Google... so google needs to be able to index the site and pull the content. Hrmm... what to do...
Darn, that is just too bad. The cache is part of the deal when you want your site to come up in search results on Google. Love it or leave it.
I hope that Google tells them where to go. They can you know...
"Google is red hot
That NYT site ain't doodly squat"
I have never read an online NYT article. My life is just fine without NYTOL. I often depend on Google to find useful information. I would not miss NYT results. Funny, I have never even seen NYT come up when I search for stuff.
Aren't NYT articles half fiction anyway?
The whole idea of free but requiring a login to view the content is incredibly stupid anyway. I am not giving them my email so they can sell it to spammers....
l8,
AC
You've got to feal sorry for the guys who actually own: xx.com xxx.com abc.com qwert.com etc there spam boxes must be enourmous.
My spelling isn't bad, I'm evolving the language
They just need to stick/> in their headers. How hard is that? Not.
--
Internet Explorer (n): Another bug -- that is, a feature that can't be turned off -- in Windows.
which seems, to me, to be praising Henry instead of pointing out how useless this sacrifice was
The Mongrel Dogs Who Teach
How about if the Times got over their registration fetish?
From the Times Subscriber Agreement:
What is meant by "exploit"???From the "Forums and Discussions" section:
What is meant by "abusive"???And how about this>
Interpretation: The user/poster is entirely responsible for the content of their post, which the Times may alter in any way. Yikes!!! Granted, this applies only to content submitted to the Times, but the wording seems pretty scary.
The Russians have won. They have made the world a cesspool of distrust, greed, fear and hate.
..to censor their cache. Those that don't want their content cached should fix their web servers and firewalls first. My web site prohibits known web crawler bots, and google doesn't cache it. No problem! I didn't have to harrass google about it and they don't have to break their own promise to not be evil.
-- I am. Therefore, I think!
my personal favorite has always been - ucan@suckit.now
"Warning: MEMRI and ADL are dangerous propaganda machines."
Herr EyeEye, did you join the Hitler Youth when you were 6, or did you wait until you were 9 years old?
" What the hell are you talking about? the USA has a bad rep worldwide _precisely_ in the crowd of people who know such things."
The hatred is strongest among the evil-minded and ignorant. France protests against the U.S. just a couple of years after they had protests about "Jews controlling the world". It has a bad reputation only because too many people are ignorant.
"Lay off the patriotic crack pipe for a little bit."
Not as much patriotic as informed about global issues.
"Any good search engine should ignore "robots.txt". If you don't want it read, don't put it up on the web in the first place."
And if you don't want people breaking into your house, don't build one. If you don't want people driving your car without permission, then don't buy one.
"So sue every user with Netscape, MSIE, and Opera: they are copying web content into their own caches all the time."
There's permission for this embedded in copyright law (You know? The one that every "/." reader is an expert on).
username: jaysonblaire
password: foolja
Either way I'm not registering with the NY times, I don't want to bother with another login just to read a newspaper. It would be nice if slashdot would just ignore them and give it's business somewhere else.
" my personal favorite has always been - ucan@suckit.now"
I'm fond of
nevergiven@neverspammed.com
happens everytime you go to a website. Creating a copy of the content is the primary means of internet communication. I don't see how google caching the pages is any different than me viewing it in my browser. It's not like google takes the credit for the content. If it were so, there would be no way for any web search to work without owning all the searchable content.
Umm, I don't see how you proved that I am in anyway Anti-Semitic. I just said that I hate the ostensible inaccuracies and biases towards the Israelis. There's absolutely no necessity for such a predisposition in today's world, and it just irks me to no end when I read a lengthy report on an Israeli death due to Hamas gunfire in the midst of an Israeli led attack, only to briefly -- two to three sentences -- mention in the conclusion on Palestinian deaths caused by Israelis. That's not just or legitimate journalism.
It should be illegal for the NYTimes to prevent access to those articles. If they do not want to pay to host the sites, fine, they can give them to another archive that will, but that information is ours to access like it would be if we were to visit a library.
All data is speech. All speech is Free.
Has anyone noticed the "survey" page from the Washington post? I have found that it is impossible to put a birthdate of 1899. Now, there are a few people in the world who are 104 years old or more (I am not one of them), so how can they truthfully respond to the Washington Post's reader survey? Or does the Washington post assume that people of that age don't use the Internet?
move or be moved
You are welcome to use xxxxdd@xxxx.com any time.
Tried it. "This address has already been registered." Doh!
-Ryan, with the unoriginal sig
" only to briefly -- two to three sentences -- mention in the conclusion on Palestinian deaths caused by Israelis."
This is because the deaths caused by Israelis are very rare and typically actually caused by Palestinian militants who use their own civilians as human sheilds.
Even now, groups like Hamas and PFLP still locate their hideouts in civilian areas.
"But Google's opt out program is more akin to telling burglars "Hey, I didn't want you to steal my TV, please return it." The difference between Google and a real burglar is of course that Google "returns the tv" if you ask them to do it, but returning stolen goods doesn't make theft legal."
More horrific abuse of the English language. It is not theft is nothing is stolen, and when something is copied it is never stolen (as it never meets the part of the definition of theft which includes "taking away").
it used to be, for the first few months of google news' existence, that you could check the cached copy of the news article as it was fed to google.
this isn't what ppl think. the cached copy todays post is referring to here is the google search engine cached copy, the one spidered by the crawling bot.
About two months ago google stopped access to the google news direct cache. You could request the cache for a document the same way you always did, with a cache:url.tld/blah.html, but you had to do it from the newsDOT server, which the toolbar did not default to. In effect if you 'checked the cache' on a news story it would redirect you to wwwDOT instead of newsDOT and tell you no such document exists. a simple rewrite of the subdomain, and pow, a plain text cache of how the news story was fed to google.
I have to say, in the couple of months that we had access to this, information watching was a whole new field. I personally came across more than a handful of articles that were pulled within 24 hours by the newspapers in question. Crazy interesting. no tinfoil involved.
Speaking from personal experience, Google's sloppy handling of meta-tags (which can tell web spiders/crawlers to "go away" and not index a given site) is a long standing issue which won't go away overnight. On LiveJournal, an individual user is allowed to select a check-box on their settings page which determines whether search engines may index that user's journal or not. I've got my journal set up to disallow indexing.
Theoretically, Google should not turn up search results from my journal.
However, the reality is, people have sometimes been able to find entries in my journal using Google. This got me in hot water a while back, because I had a public journal entry pertaining to my use of a wireless access point installed on the network of a certain company... That was how I got traced.
Without commenting on the ethics or morality of what I did, or whether I deserved the fallout that came after, I'd like to point out that none of this would have happened if Google had respected the meta-tags on my journal, which expressly forbade indexing.
So the only way for Google to honor the wishes of the New York Times is to give the NYT preferential treatment -- and even then, a few articles might slip through the cracks. I used to think that web caching and indexing sites regardless of the wishes of the site owner was no big deal. Now, I'm not sure where I stand.
You know they're just aching for summa dat cache money. ...yo.
We're talking about user's registering to view web pages. Not authors registering with the copyright office. Presumably, a large media organization like NY Times registers all their works with the copyright office.
Start your own newspaper. It's a free country. Get bigger/better than the NYT. Get reliable sources people can count on. Get top notch articles. Get the stuff people want to read at your site and then DON'T REQUIRE LOGIN and we'll all use your newspaper site instead of the NYT.
Keep in mind there are tons and tons of newspapers out there, and yet we still link to NYT more than most others, and that's not for nothing.
Good luck. Be sure to post here when you're up and running.
Oh, and if you're thinking of simply caching the NYTs everyday, don't bother. That's w34k.
because I have been enjoined by this Holy Office to abandon the false opinion which maintains that the Sun is the centre
You can't start your own newspaper, because Clear Channel controlls all of the media.
This is what Moveon.org told me.
"If I happen to have sex with the blinds open, should I have to assume that somebody like you is standing in the building across the street with binoculars?"
Bad analogy. We're talking about putting stuff up on the public Internet, not leaving our firewalls down. A better analogy would be having sex in the middle of the street. *GASP* somebody saw it!
" "if you didn't want to read spam and learn about my valuable products, you wouldn't have an email address.""
Nothing like that at all: the spam intrudes into your space. Reading and caching web pages you put out into the public involves nothing like this.
" but what kind of jerk would want to actively create a world where that was the case (which is what you seem to be advocating)? "
A world where if you publish something *GASP* someone might read it.
username: Slashdot33 email addy:slasher@slashdot.org Password: Slashdot I just don't get it. The user ID Slashdot was already taken!
Maybe we can agree that the NYT is a well-written, serious and interesting newspaper.
If you agree with political agenda of the NYT, then it is well written, serious and interesting. Case in point: All of the many articles writen with the express purpose of pressuring the golf club where the Masters tournmanet is held (Augusta) to accept women as members. Remember, Augusta is a private club receiving no funding from any level of government. Shouldn't a private club be allowed to admit whomever they want?
Do you think the NYT would back me if I wanted to join a welders union but was not a welder. Should the welders union be forced to accept me?
How about backing a male running for president of NOW?
The NYT wants a double standard in that they only want to apply rules to some orginization or person when it suits the political goals of the NYT, they do not want to apply them when it does not.
Yahoo....for email
NYTimes wants Google to cache their site otherwise NYTimes will drop from the scope of millions of Google users. That's ad revenues and everything dropping.
.
The value of Google users accessing NYTimes via cache and reenforcing brand recognition of the news is not the maximum value of the news-product
The NYTImes just wants its cake to and to eat it, too.
Why would NYT want their content cached AND don't want anyone to actually use the cached page? Waht's the point of such a cached copy -- beside wasting Google's resources?
Life is the slowest way to death.
The Internet simply does not need news and information web sites that require registration.
Stop registering.
There are plenty of sites out there that have things to offer without requiring registration in return. It is a privelege to be able to put a site on the net and have it be viewed by many people, and the attention given by many people is payment in and of itself. I reject additional payment in the form of registration.
Make NYT choose between getting with the program or dying. Don't post stories with links to NYT. When you see a page from NYT or Washington Post that wants your name and email address, hit the Back button. The web will evolve to become a better place.
Those who argue that all good web sites and all content will vanish without registration are just silly.
Couple of clicks?
For those of us less technologically inclined, can you provide a little more detail?
tia!
the number of reference to nytimes here on slashdot does indicate something about the value of that content, considering that nytimes is not even a technically oriented paper. It just has good covererage (what's a Jayson Blair or two among friends... read with a grain of salt, but still).
PS: besides complainers are lame since nytimes allows several ways around their registration... besides caching (there is the partners. links and there is the fact that you don't have to read it if you go through google or other partners that don't require registration).
or better yet... the nytimes doesn't use that email for anything... you just set your own password, you can use any old email address...
-pyrrho
Hehe - tell these guys about it:
asdf.com
Yeah, I always like to try abuse@domain for sites that require registration. Kinda mean to the postmaster, but if I "opt-out" and they still send something then they're spammers anyways.
Nothing to see here; Move along.
well, maybe because it IS a quagmire... our guys are getting killed there every day since the so-called "end of hostilities"
This debate would be a lot easier if we had a nice Google cache of NYT coverage of the middle east ;-)
The thing that disturbs me the most about the Isreal vs. Palastine situation is what it says about human nature considering the decendants of the victums of Nazi monsters are now acting... well, just like the Nazis. They've turned entire cities into concentration camps and taken to bulldozing houses with people inside.
but those links provided aren't to google's cache of the site, but to the NYTIMES site directly with a flag that turns off their "require login" scheme... the simple variable change is &partner=GOOGLE
tell google not to cache already happens, they need to fix the hole on their end.
MARIJUANA, SHROOMS, X: ONLINE?! - E
yes, I agree, and would like to subscribe to your newsletter
Seems the perceived problem is 'dead' links. Isn't an archive of NEWS of all things a GOOD thing??? Sometimes you want to see the old/original version of whatever.
My favorite: nobodies@home.com
Also: fyou@your.ass
Try the Wayback Machine. It caches EVERYTHING - NO EXCEPTIONS.
OH NOES!!! IT APPEARS YUO DO NOT HAVE ENOUGH MONEY TO PAY FOR DIS HERE PIZZA! WAHT EVER ARE YOU GOING TO DO!?!?
What I do at sites like NYTimes is to use a generic login that can be shared with many other people. This will help with concerns of privacy and being tracked.
:)
Start with username "nospam", password "nospam". Enter bogus information when registering the account, and use a throwaway email address. If that username/password is already taken, advance to the next one: username "nospam2", password "nospam2", and so on.
At NYTimes, I registered this, and currently use the following:
nospam2/nospam2
This works great. If this should ever stop working, hopefully someone will register nospam3/nospam3. I'm glad to see this is becoming a standard. At several sites now, I can just try nospam/nospam, maybe nospam1/nospam1, then nospam2/nospam2, and so on. Often, I get in, or just create the account if it hasn't been already created.
Try this the next time you get nagged for registration when trying to read a newspaper
Dr. Demento On The 'Net!
Aren't there Tripod hosted pages that get around this already? The google cached version of tripod pages tends to cover the background with "Hosted by Tripod" images, making the text near impossible to read, forcing the frustrated user to click on the non-cached link, which doesn't have these images in the background. (Although, it may make google translations difficult as well.) I'm not exactly sure how tripod does this, but they seem to do it pretty effectively.
My significant woman tried to register big_trash as an email login name, but it was already taken. She had to use Big_big_trash, instead.
-1 redundant ? jeeze you moderators are mean today. There was only one other post asking that when I posted which I didn't see until later as it was buried in a thread.
I didn't see anyone answering the question either.