Publishers Seek Change in Search Result Content

The Text I Actually Submitted by explosivejared · 2007-12-02 08:37 · Score: 5, Interesting

When I submitted this I added that a lot of times the more I see in a search result, the more likely I am to hit that website. I know going in that the search engine is going to have the full story. It's a summary. That being said, I submitted this to point out the misstep I think publishers are taking. Search engines and aggregators drive their business, and usually they do it for free. I don't understand why anyone would think it would be a good idea to mess with that. Hopefully someone can explain this to me, as the stuff in the article led me to believe the publishers are making a big mistake.

--
I got a catholic block.

Re:The Text I Actually Submitted by Anonymous Coward · 2007-12-02 09:00 · Score: 3, Interesting

It may be the case that they have ads on the page for which they get paid by the page view, and by allowing search engines to show a summary, you may be saved from going to the page, depriving them of revenue.

However, I tend to agree with you, and when I don't see a relevant summary, I'm simply less likely to click through to the page, so this may well backfire on them. Either they're not understanding search users' usage patterns, or else they believe that this is so prevalent that nothing will have summaries, and searchers will be forced to click through to find anything.
Re:The Text I Actually Submitted by iminplaya · 2007-12-02 09:09 · Score: 3, Insightful

Not only that, but if the big search engines start restricting search results, we might see many more "home grown" search engines fill the net with spiders that won't respect robots.txt, and start clogging the tubes which are already getting clogged with advertisement. As it is, I don't care if the publishers rot.

--
What?
Re:The Text I Actually Submitted by wizardforce · 2007-12-02 09:22 · Score: 2, Interesting

Tom Curley, the AP's chief executive, said the news cooperative spends hundreds of millions of dollars annually covering the world, and that its employees risk often their lives doing so. Technologies such as ACAP, he said, are important to protect AP's original news reports from sites that distribute them without permission.

Currently, Google, Yahoo and other top search companies voluntarily respect a Web site's wishes as declared in a text file known as robots.txt, which a search engine's indexing software, called a crawler, knows to look for on a site.
I hope they realize that restricting search engine crawlers with robots.txt this way really doesn't do much other than decrease the number of people who visit their dite in the first place. I wonder how that will alter their revenue streams. Let them go ahead with it and the whole thing will be self correcting.

--
Sigs are too short to say anything truly profound so read the above post instead.
Re:The Text I Actually Submitted by fm6 · 2007-12-02 09:41 · Score: 3, Informative

Even without your comments, your submission is way too long. You quoted nearly one third of the article! Next time, take the time to summarize the article in a few sentences. Not only will that make room for your opinions, it will make for a more readable submission that's more likely to he accepted.
Re:The Text I Actually Submitted by pla · 2007-12-02 10:36 · Score: 4, Informative

Hopefully someone can explain this to me, as the stuff in the article led me to believe the publishers are making a big mistake.

Simple - They want to have their cake and eat it too.

They already have the absolute power to block Google. Further than that, Google (and every major search engine out there) honors the robots file, so they don't even need to go so far as actually "blocking" Google, they can politely tell it to go away.

However, doing that amounts to committing web-suicide for any online content producer, and the publishers know it. So they can't really do that. Thus, they bitch and whine about the unfairness of all the traffic (and corresponding ad revenue) Google brings them, for the sake of the very small number of "lost" hits resulting from people getting a sufficient answer directly from the search results page.

Can you hear the violins?
Re:The Text I Actually Submitted by ShieldW0lf · 2007-12-02 10:56 · Score: 2, Interesting

As it is, I don't care if the publishers rot.

I do. Every time I hear about something like this, the site goes on my CustomizeGoogle blacklist, never to be seen again. It was the slashdot policy of posting "registration required" links to the New York Times that got me started on this path, and honestly, I'm better informed for it. All these big "news" publishers deliver is sanitized, oversimplified, dumbed down, biased and superficial stories blended with propaganda and outright lies concocted by private interests who stand to gain by your being misinformed. They make you stupider for having been exposed to them. Anyone with integrity has already adapted or left long ago, and those that are left are personally responsible for the wreckage. I hope airplanes land on their heads.

--
-1 Uncomfortable Truth
Re:The Text I Actually Submitted by Anonymous+Brave+Guy · 2007-12-02 11:05 · Score: 2, Insightful

Wow. Don't you think you're overreacting, just a little?

Your sig is particularly ironic here. If you want information to be free, you're welcome to offer to pay the salaries of all the journalists, reporters, cameramen, sound crews, and support folks who are out there all over the world collecting it. Go ahead and put your money where your mouth is.

--
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
Re:The Text I Actually Submitted by LiquidCoooled · 2007-12-02 11:29 · Score: 2, Insightful

That's the problem.

They want you on their site, but they want the power to summarise and manage their search engine face to maximise foot traffic whilst not giving the whole story away.

--
liqbase :: faster than paper
Re:The Text I Actually Submitted by ShieldW0lf · 2007-12-02 11:34 · Score: 4, Interesting

Wow. Don't you think you're overreacting, just a little?

Your sig is particularly ironic here. If you want information to be free, you're welcome to offer to pay the salaries of all the journalists, reporters, cameramen, sound crews, and support folks who are out there all over the world collecting it. Go ahead and put your money where your mouth is.

I am.

I'll be launching a service in the new year to help actively creating artists make a profit off selling original works, leveraging the copyleft and mashup cultures to generate a fanbase and simultaneously devalue the global copyright pool.

For the right types of creators, the strategy of increasing the amount of budgets available for custom work by annihilating the cost of existing bodies of work is a valid one, and I intend to make it very easy for those types of people to do so as a side effect of their making money off the things that you cannot copy.

You'll excuse me if I wait till the new year to slashdot myself, but I assure you, I have sunk hundreds and hundreds of man hours and a lot of my own dough into putting my money where my mouth is, and when I'm ready, you will know all about it whether you like it or not, because it will be some noteworthy stuff.

So no. I don't think I'm overreacting at all. I like to think when it all pans out in the end I'm going to play some small but important personal role in bringing the old things crashing down as a matter of fact. And have the people doing the real work be richer for it.

--
-1 Uncomfortable Truth
Re:The Text I Actually Submitted by Anonymous+Brave+Guy · 2007-12-02 12:43 · Score: 1

Well, for what it's worth, I admire your dedication and willingness to give it a shot. I can't help but suspect it will only be a drop in the ocean, but best of luck to you. If you make it and in five years we're all having a very different discussion, I will be more than happy to concede that I was wrong today.

--
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
Re:The Text I Actually Submitted by Garrynz · 2007-12-02 12:50 · Score: 1

But why is it opt out when every other media is opt in?
Re:The Text I Actually Submitted by 1u3hr · 2007-12-02 14:40 · Score: 3, Informative

But why is it opt out when every other media is opt in?
1) "Every other media is opt in" -- not true. Fair use applies for most media, allowing summaries and brief quotes without permission, which is what this is about. E.g.: Watch your TV news and you'll often see video taken from other TV news shows, clearly often without explicit permission -- does any US station pay Al Jazeera when they use their video?
2) The web has always been "opt-out". Thus if you change this assumption, the vast majority of web pages, with no expressed policy, would be excluded from search engines.
Re:The Text I Actually Submitted by TheLink · 2007-12-02 15:45 · Score: 1

Sounds cool.

Is there a way to pay/tip OSS coders directly? I suppose that might be such a great thing as it becomes a popularity contest - and some code though vital might not attract as much attention from the masses.
--
- Too many replies beneath your current threshold
Re:The Text I Actually Submitted by larry+bagina · 2007-12-02 16:05 · Score: 1

I find that a little hard to believe. First of all, because I gave my boyfriend oral sex for the first time today. We're not worried about STD's we've never been with anyone else and have both been tested. We didn't use protection and he cummed in my mouth because we both wanted him to. Is there anything he can do in his diet to make it...taste...better?
Next time, oral before anal.

--
Do you even lift?
These aren't the 'roids you're looking for.
Re:The Text I Actually Submitted by MidnightBrewer · 2007-12-02 16:07 · Score: 1

What you're suggesting is that you don't care about the livelihood of the people who supply you with the information you read every day on the web. Sure, they could stop publishing tomorrow, but then we'd all have to go back to hobbies that don't involve reading on the computer.

--
"Give a man fire, and he'll be warm for a day; set a man on fire, and he'll be warm for the rest of his life
Re:The Text I Actually Submitted by iminplaya · 2007-12-02 16:25 · Score: 2, Insightful

You got it wrong. I don't care about the livelihood of the people who try to restrict and monopolize the information I read everyday on the web. Big difference there.

--
What?
Re:The Text I Actually Submitted by FuzzballtheGreat · 2007-12-02 18:10 · Score: 1

I'm currently doing a masters course in which we deal with just these kinds of problems. There are so many things going on, and so many things are hated by the publishing world. It seems like they are mostly stuck in their own traditional (and rather) rigid business models, simply going "Ooo, digital age, it's evil, go away", instead of taking the opposite stance of taking up on the new technologies and turning them into something for their own benefit. This certainly won't be the last thing we hear of such a thing...

--
"As for believing things, I can believe anything, provided that it is quite incredible" - Oscar Wilde
Re:The Text I Actually Submitted by cheater512 · 2007-12-02 20:41 · Score: 1

Yes granted most of their stories is fluff, I doubt you could get the full story from the summary search engines provide.
Re:The Text I Actually Submitted by SnowZero · 2007-12-02 21:29 · Score: 1

They want you on their site, but they want the power to summarise and manage their search engine face to maximise foot traffic whilst not giving the whole story away. When other content sites do that, its called cloaking, and it is (rightly) frowned upon by search engines. If you give news sites the power to summarize, who enforces that the summary has anything to do with the story? Also, if the story can be "given away" in the three sentences in a google summary, maybe the news sites should write better stories.

In print, when I reference another writer's work, I get to choose the quote. This new online restriction is essentially giving select writers the power to control how they are quoted. This would not be tolerated in print, and it should not be tolerated online. The funny thing is, news sites make a living by misquoting people in interviews and presentations, by making highly selective edits to paint a particular spin they want for a story. In my old line of work, I was interviewed a few times a year, and while many press people were decent, far too many were looking for the most controversial sound bites rather than a summary of my actual views; I would have to speak in a very guarded manner to make sure every single sentence would still sound ok when taken out of context. So, if the press wants the power to control the summarization and references to its own works, it ought to give the same control to those providing it with content. Of course, the press would never give up that power, and would claim that it would ruin their industry. In that case, why should we let them do the same thing to search engines?

So, how about we just stick with people deciding on their own quotes and summaries of others' work? It may not be perfect, but it doesn't give any one party too much control compared to others. Its been working adequately for the internet for 10 years, and for hundreds of years in print.

P.S. L.C., We probably don't disagree on these issues, but your post is what inspired me to write. Mostly its aimed at the news sites trying to push this new form of control.
Re:The Text I Actually Submitted by vrmlguy · 2007-12-03 02:16 · Score: 1

If you give news sites the power to summarize, who enforces that the summary has anything to do with the story? If you read the ACAP standard, you'll see nothing like that. Instead, the news sites get to say how the link is summarized: you may specify a "snippet" (which is a summary computeed by the search-engine, not the news service), or an excerpt, or the whole web-page (think Google image search or those annoying pop-ups that a lot of blogs are now using on their links).

--
Nothing for 6-digit uids?
Re:The Text I Actually Submitted by Zenaku · 2007-12-03 02:29 · Score: 1

All these big "news" publishers deliver is sanitized, oversimplified, dumbed down, biased and superficial stories blended with propaganda and outright lies concocted by private interests who stand to gain by your being misinformed.

I was going to quip that you forgot about ads, but then I realized that even though you didn't use the word, you did use its definition: outright lies concocted by private interests who stand to gain by your being misinformed.

So. . . my bad.

--
If fate makes you a motorcycle, you become a motorcycle.
Re:The Text I Actually Submitted by vrmlguy · 2007-12-03 02:29 · Score: 1

"Home grown" search engines cause their own problems, above and beyond just clogging the tubes: http://yro.slashdot.org/article.pl?sid=07/12/02/1515247

--
Nothing for 6-digit uids?
Re:The Text I Actually Submitted by Tanktalus · 2007-12-03 06:09 · Score: 2, Funny

As a slashdot user, I *only* look at the summaries. I don't click to read the actual article, but learn everything I need to know about a subject simply by the summary available on google.

It works fine here, so why not on google?
Re:The Text I Actually Submitted by amRadioHed · 2007-12-03 10:24 · Score: 1

Sure, they could stop publishing tomorrow, but then we'd all have to go back to hobbies that don't involve reading on the computer. Sweet. The sooner the better. I waste more time on this damn thing...

--
We hope your rules and wisdom choke you / Now we are one in everlasting peace

well by Anonymous Coward · 2007-12-02 08:39 · Score: 0

sites can say what they want shown in the result...

the day i can start blacklisting results.

i get enough ads already.

Re:well by HiThere · 2007-12-02 10:36 · Score: 1

I think you don't understand the topic.

Robots.txt already allows sites to tell search engines what to index, and what not to.

--

I think we've pushed this "anyone can grow up to be president" thing too far.
Re:well by LiquidCoooled · 2007-12-02 11:33 · Score: 1

And the same netiquette also says you don't give one answer to the spider and another to the end-user.

They want it all.

--
liqbase :: faster than paper
Re:well by HiThere · 2007-12-03 04:35 · Score: 1

I don't think robots.txt has EVER been interpreted that way. Robots.txt is instructions to robots (i.e. web spiders) on how to behave. It says little to nothing about what information will be presented to a person who browses to there.

Well, the person who browses there should be able to see at least all that the robot can see...so that's a limitation, of sorts. Robots.txt is purely intended to restrict how robot web searches act.

If you meant that the end-user should be able to see at least as much as the web spider, then I misunderstand you. That's just occured to me as a potential meaning of what you wrote.

Yeah, they want it all. Of course. Who doesn't. What's wrong is that they feel entitled to have what they want even if it hurts others.

--

I think we've pushed this "anyone can grow up to be president" thing too far.

So they tell you what they don't want you to see? by Anonymous Coward · 2007-12-02 08:41 · Score: 5, Interesting

Hmm, i wonder how long before someone opens a search engine that indexes only what is "hidden"(yeah, really...) by the ACAP settings.

Just don't do it in the US or someone will tell the judge: "The defendant knowingly circumvented the DRM - which is called ACAP - of our online newspaper".

ACAP - Anonymous Coward Anonymously Posting

This says it all... by doas777 · 2007-12-02 08:42 · Score: 4, Insightful

from TFA ""The free riding deprives AP of economic returns on its investments," he said."

same old rule applies; never trust anyone who uses business terms like ROI, for he cares not for you or society, but only for what he can remove from your wallet, without getting arrested over it.

Re:This says it all... by Seumas · 2007-12-02 09:48 · Score: 2, Insightful

I don't see how difficult this is. If you do not want something on the internet, don't put it on the internet. It's not like Google is going out there and signing into a paying account and indexing paid-for content. In fact, how many times have you found something on google, clicked it -- only to find that all you can read is one paragraph before the site (NYT, etc) throws you a sales pitch to pay $5 or $20 if you want to read the rest?

My opinion? Good riddance to the lot of them. Please take all the "yahoo answers" and "mahalo" results that show up in various searches with you.

If news outlets and publishers had their way, google would have to pay them for the privilege to index their sites. And then every time their site's link appeared in a google results page, they'd charge google again. It's a pathetic attempt to try to wrangle some revenue out of a failing concept.
Re:This says it all... by alexgieg · 2007-12-02 12:37 · Score: 1

We should revert the argument by requiring news reporting agencies to pay for facts. After all, one might argue, what they do is just sending people around to look and tell what they've seen. They don't "produce" anything! Is it fair that all the people who produced facts to be seen and told not be paid for them? And how about those photos of buildings? And of people walking on the street? All of them should be paid too! It's time we stop news reporting agencies from leeching the hard work of fact producers without paying back their fair share!

Me? Well, I grant any news reporting agency the right to look and tell anything about me not prohibited by law for free, provided such reportings about me are in their entirety freely available to news aggregators. Don't like it? Then please pay me $1000 per mention of any data directly related to me. Oh! And I accept PayPal!

--
Conservatism: (n.) love of the existing evils. Liberalism: (n.) desire to substitute new evils for the existing ones.

My reaction... by Z80xxc! · 2007-12-02 08:44 · Score: 5, Insightful

Personally, I think that it's useful for Google and other search engines to show what's truly relevant when you're searching for a page. The fact is, I'm more likely to click on a search result if I can see some of the actual content, and more specifically, the actual text or images that I was looking for. If they don't show me what I want to see, I won't see the rest of it. If it only shows some text that they decide I should see, then it makes it much harder to determine what I'm actually looking at. Even as it now, when results come up that are ambiguous, I find myself less likely to click on them. I readily admit that robots.txt is getting old and isn't really enough any more, but I'm not sure if what they're proposing is the right answer. Additionally, if Google were to implement a new method of searching using ACAP, then what would happen to the sites using the old methods? Would they not be indexed? What if I want all my material to be shown and I don't feel like going through and choosing every little detail about what to allow and not to allow? It's an idea worth looking at, but it's not anywhere a finished, usable idea.

Re:My reaction... by maxwell+demon · 2007-12-02 08:48 · Score: 1

What happens if you don't have a robots.txt? I guess the site will just be treated as having an implicit global permission. I'd expect the same to be true for ACAP: If it isn't there, fall back to current behaviour. Especially if a robots.txt is there.

--
The Tao of math: The numbers you can count are not the real numbers.
Re:My reaction... by fatphil · 2007-12-03 00:51 · Score: 1

Clients/robots/etc. do not _take_ data off a server, they _request_ data off the server. The server decides if the client has permission, and then serves that content if it does.

This 'implicit global permission' you speak of is simply a configuration of the website. There's nothing 'implicit' about it, it's a particular configuration that you've actively chosen to have. Sure, your 'choice' process may have been nothing more than "I'll accept the defaults", but that's still an active choice. If a client's request was granted, it had permission to access the data. That's the definition of permission. If you want to restrict permissions, then use .ht* and robots.txt, and logins, and captchas, etc. If you don't do that, then you are _chosing_ to serve everyone everything. Your choice, don't complain about it.

--
Also FatPhil on SoylentNews, id 863

Terms & Condition by SaidinUnleashed · 2007-12-02 08:45 · Score: 5, Insightful

I really wish that the AP and other similar entities would realize that no matter the legal backing of their terms and conditions of redistribution very few people actually care, and people care less every day. At Burger King, they provide a copy of the newspaper. Does the AP get money for every reader? I think not. This is just are ridiculous as it would be if they tried to make Burger King pay for every person who reads the newspaper while in the restaraunt.

--
Shiny. Let's be bad guys.

Re:Terms & Condition by peragrin · 2007-12-02 09:29 · Score: 1

actually yes they do. Burger buys a dozen newspapers and leaves them out for their customers. at $.50 each it is a cheap way to get people to come in the door in the morning.

but your analogy isn't correct. It's more like a library charging people to look through the catalog to see if the books they want are present.

--
i thought once I was found, but it was only a dream.
Re:Terms & Condition by shawb · 2007-12-02 09:38 · Score: 1

Yes, there may be a gratis copy of the newspaper at Burger King. And the AP DOES receive financial benefit from that newspaper sitting there... because Burger King leaves the ads in. If Burger King were to put copies of all the newspaper stories out with advertisements stripped out, then they would probably get a cease and desist fairly quickly. That could be construed as similar to a search engine displaying the bulk of what is displayed in a news story. I don't think the AP is so much looking to stop the current methods of Google or Microsoft Live so much as preventing news aggregators from crawling websites and posting essentially the whole article. The latter would probably fall under copyright infringement, and likely commercial copyright infringement, as the aggregator would no doubt stip the ads from the original work and put their own in to make money off the venture. While there is indeed some controversy over whether making copies of a work for personal use constitutes intellectual property theft, making wholesale copies of a protected work for profit is probably seen as A Bad Thing(tm) even by a decent portion of libertarian leaning Slashdotters (not implying that the majority of Slashdotters are libertarian, just referring to the subset of Slashdotters who are libertarian as a drastic example.)

--
I'll never make that mistake again, reading the experts' opinions. - Feynman
Re:Terms & Condition by jonbryce · 2007-12-02 09:50 · Score: 1

For Burger King, yes.

But how many people read these dozen newspapers? I would guess a lot more than twelve people.

Here's the documentation by Wesley+Felter · 2007-12-02 08:46 · Score: 5, Informative

http://www.the-acap.org/project-documents.php

At first glance it appears to be a set of extensions to robots.txt that allow newspapers to specify things like:
This article will disappear from our site in N days, so it better disappear from search engines at the same time
Don't frame this article
Don't extract images or thumbnails from this article
If you show a cached copy of this article, it better include the original ads
etc.

Re:Here's the documentation by Anonymous Coward · 2007-12-02 09:38 · Score: 0

The question I have is how much are they willing to pay the search engines for this functionality? It's one thing to request that certain content not be indexed, but it's another to add requests that turn the search engine page into a sort of trial news forum with advertising.
Re:Here's the documentation by Anonymous+Brave+Guy · 2007-12-02 11:17 · Score: 1

The question I have is how much are they willing to pay the search engines for this functionality?

I think you have it backwards. The real question is how much the search engines are willing to pay the news sources they rely on for continued permission to reproduce their work in any form at all.

And for those about to comment about how for some magical reason copyright doesn't apply here, please note the details in TFA about the settlements between Google and a couple of major sources that have already taken place. Someone who's checked with the lawyers doesn't think it's as cut and dried as that.

--
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
Re:Here's the documentation by russotto · 2007-12-02 11:28 · Score: 1

I think you have it backwards. The real question is how much the search engines are willing to pay the news sources they rely on for continued permission to reproduce their work in any form at all.
That would be... nothing. Cipher. Zero. Zip. Null. As has been pointed out ad nauseum, if the news sources don't want Google indexing their site, they can use robots.txt. Of course, they don't really want that; they want to be indexed, but they want to be indexed their way, and not the indexers way. Tough tootles. Copyright is a red herring, really, because Google is already willing to not index their sites; copyright doesn't give them a magical way of forcing Google to index the sites their way.
Re:Here's the documentation by mdmkolbe · 2007-12-02 12:31 · Score: 1

You have a very good point here. Google is powerful enough that if they don't like particular ACAP settings, they'll just treat it as equivalent to a robots.txt with disallow set and watch the content providers come crawling back. The publishers are trying to force something on the indexers that requires extra overhead and maintenance, so it's just simpler to treat an ACAP that says "don't index unless you do xyz" as "don't index".
Re:Here's the documentation by Attila+Dimedici · 2007-12-02 14:35 · Score: 1

I think there is a place for some of these extensions to robots.txt. However, the first reference I ran across for this group was just after one of the major news organizations got major egg on its face for a news article that had blatantly false and biased information in it. When someone publicized this glaring bias/falsehood, the original news organization quickly changed their web site and denied that the article ever said that. The problem was that the original web article was already cached by Google and the whistle blower went there and got a screen capture of the way the story was originally posted.
I don't remember the details exactly, so it may not have been Google, but another third party organization that creates a cache of sight visited by their "clients"(for some definition of client). I, also, have a vague recollection that the original site was not intended for the general public (although it was not restricted), the general public was never to supposed to be aware at the blatant attempt at misinformation. When they were caught they thought that they could change it and no one would never be able to prove what it originally said. Within a week of that episode, I came across an article from this organization pushing the idea that there needed to be some way to change the way search engines index and summarize web sites.

--
The truth is that all men having power ought to be mistrusted. James Madison
Re:Here's the documentation by Sqrlly · 2007-12-02 15:55 · Score: 1

That's what I would do. As long as the search engines allow a means to opt-out, content providers shouldn't have any right to tell search engines how they do their job, anymore than the search engines can tell them what content to put up or how to design their pages. If you don't like it, opt-out. It's real simple, and most search engines will even do it retroactively to previously scanned content. Nothing indexed, no copyright violated.
Re:Here's the documentation by vrmlguy · 2007-12-03 01:55 · Score: 1

It has happened several times with the Internet Archive. Legally damaging information gets found there all the time, usually five minutes before the site decides that having a robots.txt file is a good idea. Then the site starts complaining that the IA should honor the file for content that was archived before the file was in place.

--
Nothing for 6-digit uids?
Re:Here's the documentation by vrmlguy · 2007-12-03 02:07 · Score: 1

if the news sources don't want Google indexing their site, they can use robots.txt. Technically speaking, since ACAP consists of extensions to robots.txt, the sites *will* be doing just that. I imagine that we'll start seeing lots of files that start with "Disallow: *" and then use the ACAP extensions to allow indexing. And that sword cuts both ways: Once enough sites start doing that, search engines that don't understand the extensions will have less content than the ones that do.

I see many good results from this. Right now, if someone kills someone else in a restaurant, news search sites will often include unrelated content, like a review of said restaurant. Likewise, I find search results that point to non-existent pages. In both cases, the ACAP extensions can tell the engine that this page isn't breaking news and that page will be gone tomorrow.

--
Nothing for 6-digit uids?

Seriously by Anonymous Coward · 2007-12-02 08:47 · Score: 5, Insightful

If you don't want anything to be indexed or archived, it needs to be behind a secure connection or NOT POSTED AT ALL.

Re:Seriously by Anonymous Coward · 2007-12-02 15:51 · Score: 1, Funny

and if you don't want to be logged, don't chat on irc
Re:Seriously by Jugalator · 2007-12-02 19:57 · Score: 1

Agreed, if they can't even protect themselves from a harmless Googlebot, imagine if a person with an agenda would try to access that information...

--
Beware: In C++, your friends can see your privates!

Here's a tip... by Digital+Vomit · 2007-12-02 08:48 · Score: 5, Insightful

Here's a tip:

If you don't want something to become public knowledge -- accessible by anyone -- then don't put it on the internet.

--
Modern copyright is theft of culture from everyone and it retards the progress of the useful arts and sciences.

Re:Here's a tip... by hedwards · 2007-12-02 09:02 · Score: 2, Interesting

That was largely my thought. It makes very little sense as to why anybody would blind click on a link in this day and age. I personally depend upon the summaries to decide whether or not to click. If I don't get a summary I don't click.

It would make far more sense for these institutions to just take their sites completely off of the search engines via robots.txt and save up those slots in the search results for sites that want traffic. Or perhaps limit it to just the front page, but I think that one can still do that with a competently crafted robots.txt as well.
Re:Here's a tip... by Zero__Kelvin · 2007-12-02 09:25 · Score: 0

"If you don't want something to become public knowledge -- accessible by anyone -- then don't put it on the internet."
Thank God for your widsom and the willingness to share it! I'll tell Wachovia, Bank of America, and everyone who has a computer attached to the internet that users authentication/authorization immediately!

--
Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
Re:Here's a tip... by wilder_card · 2007-12-02 09:41 · Score: 1

Nope, sorry, he's right. Authentication of bank customers is one thing, posting articles is something else. The Internet was designed for sharing information and for free linking. A lot of people are trying to use the Internet but want it to work THEIR way. To which I say, you don't like it, build your own freakin' network.
Re:Here's a tip... by garcia · 2007-12-02 10:08 · Score: 1

If you don't want something to become public knowledge -- accessible by anyone -- then don't put it on the internet.

Or use the sitemaps protocol to let them spider what semi-private information you want to offer and let users of your site decide whether or not it's worth their time to login (or whatever authentication method you choose) to read what they deem acceptable.

If you put shit up on the web for everyone to read, that will include spiders, and then stop whining when public information is read by, *gasp*, everyone.

This is a non-issue and one that people should fucking stop whining about and get some common sense about.
Re:Here's a tip... by Anonymous Coward · 2007-12-02 10:13 · Score: 0

yes, because if I make something publicly available, I lose all rights to it

like the author of a book

or a band that publishes music

or the inventor of a patented deployed technology

or a movie studio that dares to put a movie in a public theatre

sheesh, where do you live? China?
Re:Here's a tip... by Anonymous Coward · 2007-12-02 13:02 · Score: 0

Sorry, that's not the way it works. Sites can reference your original work and even reproduce chunks of the text for comment (like Slashdot regularly does) if it is for the purpose of pubic commentary, but websites may NOT reproduce large chunks (or all) of your text. Sites a copy restricted based on the fact that the original work is copyrighted. Now do you see the problem? If site XYZ posts an article on topic ABC, and then the topic becomes unfashionable, site XYZ can pull the article and everyone will collectively forget the content, except for the small chunks that were used for commentary. That is why these terms-of-use are so important to these media companies.
Re:Here's a tip... by Score+Whore · 2007-12-02 17:44 · Score: 1

The Internet was designed for sharing information and for free linking.

You must be new here. I thought the internet was designed to route around breakage.
Re:Here's a tip... by Fnordulicious · 2007-12-03 06:41 · Score: 1

I believe that he meant the "Interweb", not the "Internet".

Historical footnote: where robots.txt came from by charlie · 2007-12-02 08:48 · Score: 5, Interesting

My one lasting legacy on the web ...

Back in 1993, when I was teaching myself Perl in my spare time (while working for a -- cough -- UNIX company called The Santa Cruz Operation -- no relation to the current Utah asshats of that name), I was practicing by working on a spider. Now, back then SCO's Watford engineering centre was connected to the internet by a humongous 64kbps leased line. And I was working with a variety of sources on robots, and it just so happened that because I was doing a deterministic depth-first traversal of the web (hey, back then you could subscribe to the NCSA "what's new on the web" bulletin and visit all the interesting new websites every day before your coffee cooled), I kept hitting on Martin Kjoster's website. And Martin's then employers (who were doing something esoteric and X.509 oriented, IIRC) only had a 14.4kbps leased line. (Yes, you read that right: a couple of years later we all had faster modems, but this was the stone age.)

Eventually Martin figured out that I was the bozo who kept leeching all his bandwidth, and contacted me. Throttling and QoS stuff was all in the future back then, so he went for a simpler solution: "Look for a text file called /robots.txt. It has a list of stuff you are not to pull in. Obey it, or I yell at your sysadmins." And so, I guess, my first attempt at a spider was also the first spider to obey the embryonic robot exclusion protocol. Which Martin subsequently generalized and which got turned into a standard.

So if you're wondering why robots.txt is rather simplistic and brain-dead, it's because it was written to keep this rather simplistic and brain-dead perl n00b from pillaging Martin's bandwidth.

Ah, the good old days when you could accidentally make someone invent a new protocol before breakfast ...

Re:Historical footnote: where robots.txt came from by mondoterrifico · 2007-12-02 09:09 · Score: 1

"Ah, the good old days when you could accidentally make someone invent a new protocol before breakfast ..."

And I hear you aren't a bad writer either. I turned someone on to Peter Watts the other day at lunch (software developers for large grocery chain) and he turned me on to your writing. Now of course you are popping up everywhere, damn synchronicity.
Re:Historical footnote: where robots.txt came from by mboverload · 2007-12-02 09:09 · Score: 3, Interesting

That is one of the coolest stories I've heard in a long time.

I'm fascinated at the beginnings of the web and the people who drove it.

If you know any place where I can hear more of these please let me know. (reading your blog right now)
Re:Historical footnote: where robots.txt came from by Cheapy · 2007-12-02 09:12 · Score: 4, Funny

It figures that perl would be the root of one giant misunderstanding.

--
Would you kindly mod me +1 insightful?
Re:Historical footnote: where robots.txt came from by Cursorkeys · 2007-12-02 11:48 · Score: 1

I agree, that's a fascinating story and I'd love to hear more about your experiences as well.

As a very weird coincidence it seems I've just finished reading one of your SF novels, Singularity Sky, this weekend (loved it). The very cool thing is that I only bought it after Amazon.co.uk recommended it based on my prior purchases... The web has certainly come a long way!
Re:Historical footnote: where robots.txt came from by IAmGarethAdams · 2007-12-02 11:58 · Score: 1

(hey, back then you could [...] visit all the interesting new websites every day before your coffee cooled)
That's still true. All you have to do is find a way to filter out the line noise

(no, that's not a Perl joke)
Re:Historical footnote: where robots.txt came from by Anonymous Coward · 2007-12-02 12:34 · Score: 0

That is one of the coolest stories I've heard in a long time.
Funny thing is, webmasters are still trying to adhere to that document on how spiders should behave almost 15 years later, despite all the changes. I worked for a company that spidered the web, and we chose to embrace the HTTP 1.1 protocol extenstions to decrease our impact on web servers, yet the number of idiot webmasters who would scream about conforming to a document that implies their server might be doing real work like gopher blew my mind.
I'm glad somebody is looking into updating these standards...

And the link to ACAP... by Bill+Dimm · 2007-12-02 08:48 · Score: 3, Informative

You would think an article about ACAP would provide a link to it.

Re:And the link to ACAP... by Jah-Wren+Ryel · 2007-12-02 08:56 · Score: 5, Funny

You would think an article about ACAP would provide a link to it. Sorry, their new exclusionary rules prevent any linking to their content.

--
When information is power, privacy is freedom.

What right do they have to limit crawlers? by Entropius · 2007-12-02 08:51 · Score: 5, Insightful

As I understand it the main purpose of robots.txt is to prevent crawlers from consuming excessive amounts of network resources, not to "protect content". It's not a contract; it's not legally-binding; it's a request that automated web agents choose to follow if they want to be polite, or rather a description of how to be polite in the context of a certain site. (Nobody wants crawlers to be indexing dynamically-generated pages, for instance.) As an example, the physics preprint archive arXiv.org has a rather sternly-worded warning: "Follow our robots.txt file or you'll wander off into terabytes of dynamically-generated files, chewing up lots of our bandwidth, and we'll have to ban you to protect our bandwidth bill." That's what it's for, not "protecting content".

Banning Google from visiting a page and then summarizing its result on a search page is pretty much equivalent to Slashdot banning me from saying "There's this article at goatpron.slashdot.org/whatever that has a description of goat bestiality that I think you might find interesting".

As long as the summaries are sufficiently short so that they fall under the fair use exception (which Google search results surely do), Google can keep on doing what they're doing.

Re:What right do they have to limit crawlers? by Jeffrey+Baker · 2007-12-02 08:57 · Score: 4, Insightful

You might find it odd, but there's a lot of lawyers out there (almost all of them, in my experience) who seriously claim that the Terms of Service linked at the bottom of every commercial website. They say it's binding even if you've never read it, and even if it changes and you haven't read the changes. It's binding even if it's not linked from anywhere obvious.

Now, I realize that these people are idiots, and that probably their future involves a wall, their backs, and a revolution, but at present their counsel is widely respected among the holders of wealth and power. So when you say that robots.txt is "not a contract" you should talk to a lawyer about that. You'd be amazed at the things they say.
Re:What right do they have to limit crawlers? by Bodrius · 2007-12-02 09:33 · Score: 1

I do not think your comparison is valid.

This is more like banning you from copy/paste and re-posting 1/2 or the full article to karma-whore for a +5 Informative, "before the site is down".

Google does not have an AI that passes the Turing-test yet. They don't summarize, paraphrase, or otherwise reinterpret content.

They just extract and render pieces as-is - it's a direct quote.
It falls into fair use only as long as Google doesn't karma-whore too obviously.

The discussion of 'rights' is silly anyway - they have every right to restrict crawlers (and users), if it is their content on their servers.

Technically they can do that already, and some of them have done so in the past (obviously not through robots.txt). They're not doing so because it is not in their best interests - how many competitive content sites still require authentication?

But they have a point - reliable content is not free-as-in-beer to produce, particularly news.
From the article, it sounds like they just want a much more flexible robots.txt. That's not a bad thing - it would allow everyone to define a richer good-will contract if needed, rather than solve that through a maintenance/navigational hacky nightmare on the site.

Of course they're horribly optimistic about the rate of adoption of a new standard, and on how much they'd get away with in terms of restrictions while keeping their relevance. But that is a problem that the market will quickly correct, at their cost, as it has in the recent past.

--
Freedom is the freedom to say 2+2=4, everything else follows...
Re:What right do they have to limit crawlers? by lobStar · 2007-12-02 09:41 · Score: 1

As I understand it the main purpose of robots.txt is to prevent crawlers from consuming excessive amounts of network resources, not to "protect content".
And also help the crawlers to avoid pages which are meaningless to index and only will fill upp the search results with nonsense, like short-lived temporary pages, search results etc.
Re:What right do they have to limit crawlers? by Aladrin · 2007-12-02 09:50 · Score: 1

I suspect it -is- legally binding since Google has said that they will honor the robots.txt. If they suddenly stop, they could find themselves in for a lawsuit. (Whether that lawsuit has merit would be determined in a court of law.)

ANAL.

--
"If you make people think they're thinking, they'll love you; But if you really make them think, they'll hate you." - DM
Re:What right do they have to limit crawlers? by piojo · 2007-12-02 10:01 · Score: 4, Insightful

You might find it odd, but there's a lot of lawyers out there (almost all of them, in my experience) who seriously claim that the Terms of Service linked at the bottom of every commercial website. They say it's binding even if you've never read it, and even if it changes and you haven't read the changes. It's binding even if it's not linked from anywhere obvious. That's true, but I'm interested in whether a computer program has to obey contracts. If I write a program and it breaks contracts, am I immediately responsible, or must someone tell me that the program is breaking contracts. If the program is viewed as a tool or an extension of myself, it's probably the former. But programs are frequently not extensions of myself. For instance, if I downloaded the program, not wrote it, there would be no way I could know it was violating contracts.

--
A cat can't teach a dog to bark.
Re:What right do they have to limit crawlers? by Dr.+Tom · 2007-12-02 10:07 · Score: 1

robots.txt does more than limit bandwidth, it actually *helps* the crawlers by letting them know which parts of a site it may be a bad idea to crawl. Once, some "internet archive" bozo making a "full snapshot" of the web got into our scheduling calendar, which of course is an infinite virtual space, and had downloaded pages going up to about 2030 before I noticed it and shut him down. They said on their website that they were specifically ignoring robots.txt so their snapshot would be more complete. If I hadn't stopped him, he'd still be archiving calendar pages, probably be thousands of years in the future by now.
Re:What right do they have to limit crawlers? by Jeffrey+Baker · 2007-12-02 10:13 · Score: 1

That's why the whole idea is absurd. Some software acting on my behalf sends a request to your software. Your software can choose to answer the request and how, or to ignore it. As a citizen who has not had the opportunity to undergo the brain-erasure procedure they use at law schools, I fail to see where the contract attaches.
Re:What right do they have to limit crawlers? by Anonymous Coward · 2007-12-02 10:41 · Score: 0

Tease! I'm looking for the link to the goat pron. http://goatpron.slashdot.org/whatever returned, file not found.
Re:What right do they have to limit crawlers? by sjames · 2007-12-02 10:45 · Score: 1

I'm sure they will have all sorts of things to say, particularly if they can get you to pay them to say it. None of that changes the fact that if you print something and put it on public display, don't be surprised if people look at it. If it's at all interesting, some of them may talk about it. If that's not desired, then take it down.

There are sites that do go way too far trying to make the content look like their own, but that isn't going to be fixed by a more complicated robots.txt
Re:What right do they have to limit crawlers? by Anonymous Coward · 2007-12-02 11:28 · Score: 0

Google does not have an AI that passes the Turing-test yet. They don't summarize, paraphrase, or otherwise reinterpret content.

YET, as always, being a very interesting word (especially if you add an "I" to it, but by that point you're looking at a word that might rip your head off, vigorously attach it to the sharpened end of a stick, thus inventing the word "ggroodaa" [lollipop]; in fact this trend Yeti linguistic evolution will one day ensure that the most frightening-sound in alpine ski resorts is the gentle cacophony of an ice-cream van in the distance...). "Yet" is never quite as far away as you think it is, in fact it usually creeps up on you, makes silly faces behind your back until you turn around where it will meet you with a smug, self satisfied smile. "Yet" is, of course, a bastard.

The point being, of course, that we're not THAT far away from computer software being able to take a collection of articles and rewrite them in comprehensible, if not meaningful language differing from the original enough for it to qualify as an original work. Given that this currently only takes the intellectual capabilities of your average high school student, one might hazard a guess that we are likely to start seeing tools capable of this shortly after this generation graduates [i.e. 2018 if they manage to tear themselves away from Myspace long enough].
Re:What right do they have to limit crawlers? by Score+Whore · 2007-12-02 17:58 · Score: 1

Google needs the content producers. If Google ever pisses off the bulk of the mainstream sites to the extent that nobody will allow them to index, they'll find themselves in a bad place. They need quality content to index. Hitting hundreds of blogs and pseudo news sites won't lead to a mad rush to place ads on Google's search results pages.
Re:What right do they have to limit crawlers? by Aladrin · 2007-12-03 01:05 · Score: 1

Yes, but not as much as the content producers need Google. It is -the- main search engine now. To not have your site on Google is very harmful to business. Even ignoring the lost business from people who just don't find the site, there will be those who stop to wonder why Site X is on the other search engines and not Google. As Google has recently stated that they are removing malware sites, that will be the first thought of a lot of people. Last on anyone's list will be 'They didn't want Google indexing their content' because it's idiotic.

And if you think there's nobody else out there with your product that's willing to play along with Google's rules, you're an idiot.

--
"If you make people think they're thinking, they'll love you; But if you really make them think, they'll hate you." - DM
Re:What right do they have to limit crawlers? by Anonymous Coward · 2007-12-03 08:10 · Score: 0

For instance, if I downloaded the program, not wrote it, there would be no way I could know it was violating contracts.
Then perhaps it's irresponsible to run it. Your computer is your agent. When you elect to execute code, you should own up to what it does. Scary? Then don't run it.
$CAR_ANALOGY here.

Go for it, publishers! by xigxag · 2007-12-02 09:00 · Score: 4, Insightful

I understand completely. I too would like to stop my nosy neighbors from peering at me out of their window when I leave my house in the morning. My plan is to implement "pay per stare" at some point in the future but they aren't gonna pay if they can get their jollies for free. I blame the "Sun" and "street lamps" and "glass" and other devices that interfere with my ability to effect sole distribution over the intellectual property that is my personal image. Well, at the very least, I should be able to sue torch/flashlight manufacturers into oblivion and then use my deserved winnings to tackle the big boys 150 gigameters away.

--
There are two kinds of people: 1) those who start arrays with one and 1) those who start them with zero.

Re:Go for it, publishers! by QuasiEvil · 2007-12-02 09:33 · Score: 1

I blame the "Sun" and... Yup, I blame Sun for a lot of my problems, too - mainly Java.
Re:Go for it, publishers! by hal9000(jr) · 2007-12-02 09:59 · Score: 1

I too would like to stop my nosy neighbors from peering at me out of their window

I wish I had thought of this. We used to have these crazy neighbors across the street who would come out on their porch whenever my wife and I would go out our front door. They would just watch. Fucking neighbor TV. When I was bored, I'd walk in and out 10-15 times a day. At least we were both getting our exercise.
Re:Go for it, publishers! by Dunbal · 2007-12-02 10:09 · Score: 1

Today if you act suspicious, your neighbor will call the fbi and report you as a possible terrorist.

--
Seven puppies were harmed during the making of this post.

what to display by Grampaw+Willie · 2007-12-02 09:05 · Score: 1

from an HTML page I like a well written title

Re:what to display by BenoitRen · 2007-12-02 12:02 · Score: 1

Untitled Document

Lazy scum by ackthpt · 2007-12-02 09:07 · Score: 1

Don't want it shown, then hide it. Lazy fuckers should learn about structuring content rather than bitch about search engines.

--

A feeling of having made the same mistake before: Deja Foobar

robots.txt a W3C issue by m94mni · 2007-12-02 09:07 · Score: 5, Informative

Note that robots.txt, favicon.ico and /w3c/p3p have been raised as issues for the W3C Technical Architecture Group:

http://www.w3.org/2001/tag/group/track/issues/36

See Tim B-L's original mail here:

http://lists.w3.org/Archives/Public/www-tag/2003Feb/0093

One can only hope that any new efforts keep this issue in mind (hint: stop polluting *everyone's* namespace!).

Re:robots.txt a W3C issue by vrmlguy · 2007-12-03 02:25 · Score: 1

The good news is, ACAP extends the current robots.txt, so it doesn't pollute the namespace any more than it already is.

--
Nothing for 6-digit uids?

Viewable content should help, shouldn't it? by blackraven14250 · 2007-12-02 09:13 · Score: 1

Doesnt it make more sense to show your content on a search engine? Help people find what they're looking for? And get a view of what's being searched? Or does this apply more to paid publications?

Re:Viewable content should help, shouldn't it? by Spad · 2007-12-02 09:28 · Score: 1

It comes down to one big case of "It's not fair, they're able to read my freely available content without being forced to watch my adverts or register their (fake) personal information".

If search engines follow this ACAP standard and no longer index more than a tiny snippet of the content, then nobody will be able to avoid adverts or avoid registration ever again.

Good by 0123456 · 2007-12-02 09:18 · Score: 1

These days I find most things on the web by searching, not by following links. If these people want to cut themselves off from the world by refusing to allow search engines to catalog them, why not? People whose work is inaccessible to most because their publishers refuse to let it be on search engines will soon decide that they no longer need a publisher.

roll your own? by Anonymous Coward · 2007-12-02 09:18 · Score: 0

CPU cycles and storage are cheap now, you'll just have to run your own search engine server..

Hoisted by their own petard by hal9000(jr) · 2007-12-02 09:19 · Score: 4, Insightful

I know my position is very un-slashdotish, but there is nothing wrong with content producers wanting to control how their content, in particular, the stuff they paid to generate, from being indexed. It's not that they don't want you to see the content, it's that they want to control how you see that content. They want it wrapped in their page, with ads, and not summarized on a search page. Egads, what if you read the summary and decided not to visit the site after all?

Fine. But as we all know, we probably have a few sites that we book mark and visit often. We probably get alot of news from RSS. But alot of people are directed to sites via search engines. So if a content producer, say a news paper, doesn't want it's content indexed, then fine. It will only result in a LOSS of traffic to their site.

Look, content producers have to make money. They have people to pay, stuff to print, etc. They have expenses. It is truly sad that rather than trying to figure out how to make content relevant and useful, some content producers simply want to continue analog methods in a digital world.

Gee, just a thought, but what about a way to display a summary and an ad chosen by the content producer along with the summary? Advertisers would spend lots for that kind of exposure.

Re:Hoisted by their own petard by Anonymous Coward · 2007-12-02 09:52 · Score: 0

Look, content producers have to make money. They have people to pay, stuff to print, etc. They have expenses.

Then they should put their content behind a subscription wall. If their content is worth anything, then people will pay to see it. If their content isn't distinguishable from what any other site offers, then they'll go out of business.
Re:Hoisted by their own petard by hal9000(jr) · 2007-12-02 10:03 · Score: 1

Then they should put their content behind a subscription wall. If their content is worth anything, then people will pay to see it. If their content isn't distinguishable from what any other site offers, then they'll go out of business.

That is one model, but for consumer-ish stuff like news, folks have grown used to getting it free from TV, radio, and even on-line. Peeps just aren't that interested in paying for it, so the OTHER model is to make money from advertising. So the theory is if a user reads a cached copy or a summary elsewhere, they the content provider doesn't get the ad impression.

I agree w/you that if a content producer has good content, people will read it, return to the site, and heck, may even sign-up, BUT that is beside the point.
Re:Hoisted by their own petard by grcumb · 2007-12-02 10:08 · Score: 2, Insightful

I know my position is very un-slashdotish, but there is nothing wrong with content producers wanting to control how their content, in particular, the stuff they paid to generate, from being indexed.

I'm going to assume that you actually mean "...is being indexed."

It's not that they don't want you to see the content, it's that they want to control how you see that content. They want it wrapped in their page, with ads, and not summarized on a search page. Egads, what if you read the summary and decided not to visit the site after all?

Your tongue-in-cheek tone is noted. But at the end of the day, the Internet doesn't allow the kind of control old-school publishers want. Not only is that horse gone, there's no barn left to put it back in, if we ever did manage to find it.

It's an unfortunate fact of life that these people need to have a smart, communicative geek (like, say Larry Lessig) sit down with them and explain that a fundamental aspect of digital information is that it can be replicated with virtually no effort and next to no cost. Additionally, the Internet is a point-to-point network. It is agnostic by design, and works only as long as we accept that we have more to gain by getting along together than by working alone, following our own arbitrary rules. (Make sure Ballmer's not in the room when you get to this part.)

People like to talk a lot about the Tragedy of the Commons, but the one thing the Internet teaches us is that it's a fallacy where networks are concerned. The Internet is ubiquitous and effectively infinite - 'effectively' in the sense that there's always another copy of a given piece of information on the Internet.

This means that control is a pipe dream. The best we can do is use moral suasion to request that people respect our wishes with regards to particular content. We ceased to control it the moment we put it on the Net. The fact that most people actually do play nice is one of the miracles of online society.

--
Crumb's Corollary: Never bring a knife to a bun fight.
Re:Hoisted by their own petard by ScrewMaster · 2007-12-02 10:30 · Score: 3, Insightful

The best we can do is use moral suasion [google.vu] to request that people respect our wishes with regards to particular content.

Ha, yeah. I recently purchased a DVD where the opening scene showed a kid snatching a woman's purse and running off, with the voice of doom saying "You wouldn't steal purse would you?" in a ridiculous, nay, pathetic attempt at "moral suasion". I was then subjected to several more unskippable minutes of this asinine lecturing, various legal threats, plus a couple of movie previews and advertisements that I couldn't skip past either. What the hell? So by the time I reached the main feature, I was so irritated (seeing that I'd just paid sixteen bucks for the damn disc) that I pulled the disc from the player and fired up DVD-Shrink. Half an hour later I had a re-authored copy without all the crap, and that's what I watched.

Idiots.

--
The higher the technology, the sharper that two-edged sword.
Re:Hoisted by their own petard by Anonymous Coward · 2007-12-02 12:05 · Score: 0

Brilliant, isn't it? When you download a movie it Just Works, but if you buy one on DVD it repeatedly lectures you about theft. I wonder if movie companies use any science to validate the efficacy of their "solutions", because it seems to me that producing a product worse than the pirates do is...stupid. Here's a possible alternative: put something like "Please, Reward Us For Our Work!" at the end of the movie.
Re:Hoisted by their own petard by ScrewMaster · 2007-12-02 13:17 · Score: 1

I wonder if movie companies use any science to validate the efficacy of their "solutions", because it seems to me that producing a product worse than the pirates do is...stupid.

Well, let's face it ... these aren't the sharpest knives in the drawer.

--
The higher the technology, the sharper that two-edged sword.
Re:Hoisted by their own petard by LordLucless · 2007-12-02 14:29 · Score: 1

I know my position is very un-slashdotish, but there is nothing wrong with content producers wanting to control how their content, in particular, the stuff they paid to generate, from being indexed. It's not that they don't want you to see the content, it's that they want to control how you see that content. They want it wrapped in their page, with ads, and not summarized on a search page.

Yes, there is something wrong with it. They published it in a public medium. The snippet that search engines used to provide context is a clear case of fair use, which is why the content providers are whinging and not suing. It'd be analogous to publishers bitching about the reviewers putting all the good quotes in their reviews, so that nobody buys the book. If all the content you have to offer can be summed up in a 20-word blurb, your content sucks and your site deserves to die. If they don't like it, then publish behind a members-only area, or use robots.txt to stop the spidering.

Note that this is different to a search engine caching an entire page/site, where the fair-use case is a bit harder to make.

--
Just because you're paranoid doesn't mean there isn't an invisible demon about to eat your face
Re:Hoisted by their own petard by SanityInAnarchy · 2007-12-02 17:04 · Score: 1

there is nothing wrong with content producers wanting to control how their content, in particular, the stuff they paid to generate, from being indexed.

True, nothing wrong with wanting. But as Jayne says, "If wishes were horses, we'd all be eat'n steak!"

(I know that's not the original quote, but I like Jayne's version.)

Gee, just a thought, but what about a way to display a summary and an ad chosen by the content producer along with the summary? Advertisers would spend lots for that kind of exposure.

Yes, they would. And if they got it, they'd find out very quickly how stupendously bad an idea it is.

You see, the reason Google's on top right now is that they were the simple, fast search engine. When it was between Google and Yahoo, Yahoo's homepage was almost as big as Slashdot's, with a tiny little "search" box somewhere, maybe, and Google was just better at getting results to you quickly anyway.

In other words... As much of a monopoly as they have, if Google ever does implement something like that, I'm switching to another search. Windows Live search, even. I imagine enough people would do that, or block them, that the "value" of those ads would plummet.

--
Don't thank God, thank a doctor!
Re:Hoisted by their own petard by Fulcrum+of+Evil · 2007-12-02 17:30 · Score: 1

reminds me of a boondocks ep (the one where they sneak food into a theater) - instead of the normal FBI lecture, it showed some punk beating the everloving shit out of a grandma and taking her purse, then said "stealing a movie is exactly like beating up an old woman", while riley tapes the whole thing.

--
"We returned the General to El Salvador, or maybe Guatemala, it's difficult to tell from 10,000 feet"
Re:Hoisted by their own petard by Viceice · 2007-12-02 17:47 · Score: 1

Because I, and along with a lot of people on the internet, will refuse to click on a link that i don't know where it'll lead. Call it years of avoiding dodgy links that lead to nothing but spam and advertisements.

So say i Google a piece of news, and i find a link with a summery and a thumbnail, versus a plain link with no description of whats to come, I'm going to click on the one with the summery simple because i know i'm going to get what i want and as we know from Google's success, giving people what they want instead of dictating things to them is the winning formula.

--
Sometimes I wish I was a plumber, then I'd know how to deal with other people's shit.
Re:Hoisted by their own petard by Score+Whore · 2007-12-02 18:02 · Score: 1

It's an unfortunate fact of life that these people need to have a smart, communicative geek (like, say Larry Lessig) sit down with them and explain that a fundamental aspect of digital information is that it can be replicated with virtually no effort and next to no cost.

And yet millions or billions of people worldwide seem to want the man to hunt down fraudsters who pretend to be one of the aforementioned billions.
Re:Hoisted by their own petard by Score+Whore · 2007-12-02 18:07 · Score: 1

Yes, there is something wrong with it. They published it in a public medium.

Not really. They put it on their website with the understanding that the majority of people would be using a traditional web browser to access the content. It's not like they printed it on millions of flyers and carpet bombed every city and town.
If all the content you have to offer can be summed up in a 20-word blurb, your content sucks and your site deserves to die.

Search engines do one of two things 1) they print the first couple of lines of the page, or 2) they print the sentences around the search words. Google news seems to do the first. And the amazing way that it seems to be a summary of the content isn't because Google has some cool and awesome english language generator, but because the original content is written in a structure way. The whole point of the first paragraph of the story is to summarize what comes next. It's like you're not educated or something.
Re:Hoisted by their own petard by LordLucless · 2007-12-02 18:17 · Score: 1

Not really. They put it on their website with the understanding that the majority of people would be using a traditional web browser to access the content. It's not like they printed it on millions of flyers and carpet bombed every city and town.

You're right. There's no way they could have reach as many people via carpet-bombing as they do via the web. Their website is a medium for communication, and it is open to the public. It is a public medium.

The whole point of the first paragraph of the story is to summarize what comes next. It's like you're not educated or something.

And yet, newspapers and magazines continue to print more than just the first paragraph of any story. Yes, the introductory paragraph is designed to provide a summary, but it's also designed to *hook*. To keep the reader interested, and flipping through more ad-filled pages. The content providers should be happy that Google consistently published their hook for then - its the best thing (for them) that could be used as a summary.

In any case, my main point is still unaddressed. They published it, and Google has a fair-use right to use small snippets for the purposes of categorizing, in just the same way that you have the right to index your collection of DVDs by their copyrighted title. If they don't like it, don't publish it. Then nobody will be able to get their dirty little hands on the precious content.

--
Just because you're paranoid doesn't mean there isn't an invisible demon about to eat your face

Pointless by HalAtWork · 2007-12-02 09:22 · Score: 3, Insightful

If you put it on the internet, and users are meant to access it, why should search engines differentiate any content based on probably arbitrary criteria? If pay sites restrict content and give out special logins for paying users, search engines cannot index it and the content is kept 'private'. If a site that has non-restricted content (restricted by special login) then why shouldn't it be indexed? It would be a disadvantage to the end user, because they cannot find the content as easily (especially if the web site's search engine sucks) and it would be a disservice to the content provider, since their site would be less likely to show up in search results. What is the point? Is this the same thing as people disabling right-click on certain web sites to try and prevent you from 'stealing' content, the same content that is available in your cache, and that would be illegal to use if the content is copyrighted anyway? Is this the same thing as people embedding pictures in flash for the same reason? If all of this results in less usable, less indexable, and more annoyances, just to restrict the way content is accessible and viewed?

Then that's not the web anymore, that's not really in the spirit of the internet... why not just stick to print or something? And then have it in a special store where you can only buy it with some currency you made up, with an exchange rate you control? Oh, and have a special door for the store that can only be opened with a special device you have to order! Er, anyway... I hope you can understand my point.

--
Twinstiq, game news

Re:Pointless by Anonymous Coward · 2007-12-02 11:18 · Score: 0

If pay sites restrict content and give out special logins for paying users, search engines cannot index it and the content is kept 'private'.

Except that the content providers want to "have their cake and eat it too." You want to require registration/payment for visiting the site, but how do you get people to register if they don't know about your site? There are a *lot* of sites you can find content for through Google search results which put that content behind a registration/paywall for "regular" people. Those sites need to show search engines (all) their content, so that they show up on as many search results as possible, but they want regular people to have to register to see it. (I'm assuming they either have a deal with Google, or serve different content based on IP/user agent data.)

As I read it, this proposal is for a more formal scheme to allow search engines free access to the content so that they can index it properly, but to block out "regular" users from indirectly accessing the content without registering/paying first (if the providing site is so inclined).

The Genie.. by nurb432 · 2007-12-02 09:23 · Score: 1

Once Google admitted it can and will/does filter search results, it opened the floodgates for stuff like this.

Don't say i didn't tell you so....

--
---- Booth was a patriot ----

Re:The Genie.. by Anonymous Coward · 2007-12-02 10:42 · Score: 0

mod parent up

oblig... by doyoulikeworms · 2007-12-02 09:33 · Score: 3, Funny

Bustin' ACAP in Google's ass.

What a joke by WindBourne · 2007-12-02 09:33 · Score: 2, Informative

If these publishers want to own the search engines, then they should build their own! These engines do them a favor. This is no different than the music publishers trying to control the bands and how they get paid.

--
I prefer the "u" in honour as it seems to be missing these days.

Re:What a joke by roman_mir · 2007-12-02 15:26 · Score: 1

If these publishers want to own the search engines, then they should build their own! - yeah, we know, with blackjack and hookers, in fact forget the search engines and the blackjack.

--
You can't handle the truth.

Average people and news consumption by jhRisk · 2007-12-02 09:37 · Score: 3, Insightful

I think the mistake we're using here is that we're assuming most folks consume their news like we do. Sorry to generalize but I believe most of us seek to become informed and thoroughly review and critique what we read. However, most people are satisfied with tidbits and in fact want nothing more. For example, the macob are satisfied with a headline like "Multiple Car Accident Kills 50" and a thumb of the pile up... the noseies like "Brad Wears Ugly Glasses For the First Time" and a thumb... etc. Yes those are terrible headlines and hyperbole to make my point. Imagine a search engine unlike Google which provides summaries of multiple sources offering these tidbits in a single page without the source's ads? Oh wait http://www.ask.com/ and perhaps others although I'm stating soley that they have such a type of offering and not that they do so violating any rules.

I'm against most tactics that appear to be an organization seeking to squash an alternative or new and unknown element they think is encroaching on their bottom line and this move smells of it but feel it's a rare case of smoke without an actual fire. Just wanted to throw that out there while I seek more info on this tidbit.

--
That's just my POV... no more, no less.

It's not a bad thing. by 91degrees · 2007-12-02 10:12 · Score: 2, Interesting

Seems to be a lot of people slightly upset over this. But I think this is a good thing. They already have the ability to stop search engines from indexing at all. Now they have much more fine grain control. They can also make their results more useful by setting expiry dates. Presumably they'll also be able to be more specific about what he summary says, and might actually be more useful.

Now some sites will probably want to over control, but they'll lose out.

Re:It's not a bad thing. by eskayp · 2007-12-02 12:04 · Score: 1

If this approach to throttling fair use was good enough for the RIAA and MPAA to use on audio media and video media then it should most certainly be appropriate for other more archaic content.
Let us know who signs up so we know which venues and corporations to avoid.

--
I didn't desert Windows; Windows deserted me: BSOD
Re:It's not a bad thing. by jvkjvk · 2007-12-03 03:21 · Score: 1

Seems to be a lot of people slightly upset over this. But I think this is a good thing. They already have the ability to stop search engines from indexing at all. Now they have much more fine grain control. They can also make their results more useful by setting expiry dates. Presumably they'll also be able to be more specific about what he summary says, and might actually be more useful.

Now some sites will probably want to over control, but they'll lose out. Why should we want them to have "more fine grain control". Do you really think that they are doing this to provide you with more useful results? Do you really believe that if Google was forced to follow ACAP dictates on every page that this wouldn't turn out to be a huge mess?

I can't believe that anyone on slashdot really believes that giving content providers more control over what they provide would be a good thing. Is anyone here naive enough to believe that the this would be used for something else besides a new way to drive revenue at the expense of everyone else?

Oh, here's the driver, from TFA:
"The free riding deprives AP of economic returns on its investments," Yeah, this will turn out well. I personally think that the Parent's sentiment that "some sites will probably want to over control, but they'll lose out." is too optimistic.

I also don't expect this to take off without laws backing it up. It's really the only way it would work. Anything that one search engine does voluntarily to hamstring their search results will only lead to users bailing to another search engine that fits their needs better. Unless this metadata is a clear benefit to an end user, legal threats are probably the only way to get search engines to accept a degradation in their results.

It seems the ultimate end play is that the T&C provided on the corp sites would be ultimately a binding contract and a machine readable version of that would be encoded - something like the ACAP proposal. Also that by accessing the web page, the user or machine agrees to be bound by those T&C.

If this seems reasonable, think for a second about EULAs and then get back to me.

Just so I'm clear by Jay+L · 2007-12-02 10:13 · Score: 4, Insightful

A bunch of publishing organizations have gathered together and are attempting to create an Internet standard for restricting searchable content.

They haven't involved Google, Yahoo, or Microsoft in the process. In fact, the only search company they mention in their FAQ is Exalead, who I didn't even think I've heard of (though now I think I may have once downloaded their desktop trial product).

This is going to be implemented how?

In related news, I have issued a new policy for how I (and anyone who joins my club) am to be treated in airport security lines. I will be publishing this policy on my home page, and I am certain it will win widespread adoption among travelers.

Q:Have you discussed this with security administrators?

A:In addition to the many travelers who have co-signed the new policy, we have an agreement-in-principle from Madge, the security and commissary chief at the fourth-largest regional airport in greater Bozeman.

Re:Just so I'm clear by Looking+Confident · 2007-12-02 18:26 · Score: 1

ACAP (I feel) is simply about protecting a content originators rights, (it's IP) pertaining to content that has been published by them. And just who they (a publisher) then may decide, is permitted by that fundamental Law (associated with that Intellectual Property RIGHT that they legally "own"), to publish in FULL or, PART, accordingly. Has it "dawned" on anyone that "they" (publishers) may have "devised" a better, more cost effective way that they can get to "share" their original content (with each other?) and support their "own" Ads Networks in their doing so? That they (those represented by ACAP) also may feel the collective billions in annual "lost" Ads revenues from their original "hard copy" (to the net & specifically SE's, to date) may even be (mostly) recoverable, in their doing so? And if this is the case, who could blame them for doing so? Simply moving to protect what is theirs. I know of some (talked of, over) 4 million websites already, that are 'in place' to receive their "work" (current and historical) in what would appear on the surface, a much better approach being made than is their current situation. http://www.vortal.com/news_results.php?search=tech%20ACAP To me, ACAP is very clear and long overdue as a means of protecting original content be it current or, historical. :) LC

Re:So they tell you what they don't want you to se by Anonymous Coward · 2007-12-02 10:29 · Score: 0

There's a lot more to be said about the Coward part of that. When you post here on slashdot, Anonymous Coward just means you don't wish to be specifically identified. There's not a whole lot of coward, there, just a joke from way back. With this, the publishers are being incredibly cowardly: They're publishing material, but hiding it from view. There's no good reason that publicly accessible content on their site should be 'hidden', and that's the most cowardly thing of all.

Re:So they tell you what they don't want you to se by CarAnalogy · 2007-12-02 10:46 · Score: 1

Added complexity (and user control) most likely introduces new possibilities for abuse, which is already a major problem for search engines these days.

ACAP Tagline: Unlocking Content for All by Serhei · 2007-12-02 10:49 · Score: 1

*cough*

Re:So they tell you what they don't want you to se by morgan_greywolf · 2007-12-02 10:49 · Score: 3, Interesting

Specifically, this seems geared towards sites like Google News that aggregate stories and then publish snippets of them on their home page.

Personally, I don't really see the problem. You either want your site spidered or you don't. You don't get to control the presentation of the data that is spidered, only the search engines get to do that.

SO the thing is here is that Google takes its ordinary web spider, applies a little magic to it, and then displays the results as a news page. Big deal.

You either want your site spidered or you don't. You can't have your cake and eat it too.

--
My blog

*doh* by ubrgeek · 2007-12-02 10:57 · Score: 1

> The desire for greater control over how search engines index and display Web sites

Then design your sites better. Seriously. When I was on the team that launched http://jacksonville.com/, we spent a decent amount of time thinking about how to optimize our site for search engines, and that was 10 years ago. Too much showing? Not enough showing? Spend more time developing and designing your site ... instead of trying to emulate your print product (ahem ... *cough http://nytimes.com/ cough*)

--
Bark less. Wag more.

A prediction by Anonymous+Brave+Guy · 2007-12-02 10:58 · Score: 4, Insightful

That being said, I submitted this to point out the misstep I think publishers are taking. Search engines and aggregators drive their business, and usually they do it for free. I don't understand why anyone would think it would be a good idea to mess with that.

This being Slashdot, I predict that huge numbers of people will now arrive in this thread and say that you're absolutely right, the search engines are providing a great service, and the publishers should just suck it up because they'd die without them.

The thing is, they're completely wrong. It's actually the other way around, for the simple reason that news aggregators produce no useful content of their own.

For you or me, as someone who wants to know what's happening today, we can do one of two obvious things using a web browser. We can visit a specific news site we already know about (or at least guess at a URL), or we can start with an aggregator like Google News. Either way, many people will only read the headlines and summary for most stories. Either way, someone had to go out and get the information to write that story. But in one case, the people who brought the knowledge to the public get the page hit, while in the other, the search engine gets the hit in exchange for ripping much of the value of the other sites' content and the people who actually provided the content get nada.

It's common at this point for someone to pipe up with a fair use argument, but again, they are wrong, for the simple reason that while the headlines and summaries on news aggregators may only be small excerpts from the entire article, they represent a very significant chunk of the value. You can easily determine this by observing the proportion of users who look something up on an aggregator and never follow through to read any article in more detail; I don't know exactly what the answer is, but I'll wager it's a substantial proportion, perhaps even the majority.

Another common argument is that the news sites would die without input from search engines, but again I can't believe this is really true. When I reach lunchtime at work, I do not visit Google to find the BBC News web site, I just type in news.bbc.co.uk. (Actually, I visit the bookmark, but the first time that's what I typed.) Google, or any other news aggregator, is wholly unnecessary to my finding the main news site. Even without that, I could easily have guessed that the BBC News web site could be reached at www.bbc.co.uk/news or news.bbc.co.uk, either of which would have got me there immediately. The site is advertised via the BBC's other media as well. A significant proportion of the links I e-mail to and receive from friends and family are direct links to stories on the site.

Basically, if every search engine on the planet disappeared tomorrow, I rather doubt the big news services would care. As with everything else to do with search engines, they are just a middleman service, and one that is entirely expendable. If they weren't around, the Internet community would just develop an alternative or five, probably rather quickly, just as it always does.

On the other hand, if the big news services stopped providing news tomorrow, aggregator services like Google News would be completely dead, because they provide absolutely no value in themselves. They simply scrounge content from one source and visitors from another, and insert themselves as a middle man to cream off some of the profits.

The very fact that one service could survive quite happily without the other, while the other would die immediately without the first, tells us everything we need to know about the merits and public service benefits of each. That being the case, I find it hard to argue with the publishers' position that the news aggregators are basically ripping them off, and I don't really have much sympathy with the two most common counter-arguments people seem to be making in this Slashdot discussion.

--
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.

Re:A prediction by allthingscode · 2007-12-02 11:21 · Score: 2, Interesting

If the news and book sites wanted to keep the search engines out, they would just set up their robots.txt files to block all access. Then they would never show up on Google. The don't want to do that because they know it would be death to them. Google doesn't supply any content, but it does supply a service: It's the first place people go to find out information. If they need more than a summary, they can click on links from the summary page to get details. People aren't going to go to ten websites to look for something if they can start at one place.

You are right: If the search engines disappeared, the big news services wouldn't care. Actually, they would probably enjoy it, because people would go to the New York Times, Washington Post, and other big names sites rather than seeing these smaller sites with better reporting and commentary. But you contradict yourself as well. You say that if the search engines disappeared, the internet would just create more, but then you say that if the big news services stopped providing news, the search engines would die. No they wouldn't. The internet would create more, filling the need.

If the news sites want to control their content better, fine. But I guarantee you the next whine you will hear from them is how Google isn't directing traffic to their websites and it must all be retribution by Google for being made to limit what it displays, rather than people clicking on sites where they can read the summary.
Re:A prediction by timmarhy · 2007-12-02 11:39 · Score: 1

Value has NOTHING to do with fair use.
news isn't the only thing search engines provide, 99% of the internets content would go unoticed if it wasn't for them, hence why google ads is a billion dollar revenue stream. If news sites cut themself off from google, they will be the ones to lose traffic and customers. clue yourself in a bit please.

--
If you mod me down, I will become more powerful than you can imagine....
Re:A prediction by buswolley · 2007-12-02 11:49 · Score: 1

I have a pile of sand and some water. I didn't make them. Maybe God did, or the sun, but it was not me.
Now I build a sand castle by gathering sand and water together.
This is what news aggregations do for me. Aggregators are in the service of providing "emergent content."
Aggregator mass information relevant to some topic so that the big picture can be seen. This picture emerges after the aggregation of information(the sand) is structured so one can see the whole picture(the sand castle).

--
A Good Troll is better than a Bad Human.
Re:A prediction by Anonymous+Brave+Guy · 2007-12-02 12:00 · Score: 2, Insightful

If the news and book sites wanted to keep the search engines out, they would just set up their robots.txt files to block all access. Then they would never show up on Google. The don't want to do that because they know it would be death to them.

I'm not at all convinced that's really true. To borrow a related copyright-area theme, it's like the RIAA saying that they have to use DRM, because otherwise no-one will buy legal copies of their stuff. It's just an assumption, which they aren't yet willing to risk violating in case it goes wrong. That doesn't necessarily mean that if they had no choice but to work on a different basis, they'd lose out.

But you contradict yourself as well. You say that if the search engines disappeared, the internet would just create more, but then you say that if the big news services stopped providing news, the search engines would die. No they wouldn't. The internet would create more, filling the need.

Actually, I said the community would create alternatives. I have been rather sceptical about the real benefits brought by the search engines for a long time, principally because of their tendency to leech off the most valuable content from others, without ever giving any back themselves. I'm also not convinced they're particularly useful anyway these days; they are so frequently gamed by sites taking advantage of SEO that I find fewer and fewer useful sites on there and turn more and more to following links from other sites, recommendations from friends, and so on.

For an idea of how far this can scale, consider the nature of linking in the world of blogs: popular articles get widely cited very fast, and the quality of links is generally equal or better than the source site since the links are all hand-picked. A good blog can develop a huge readership in a matter of months, and the whole system is one big meritocracy right down to the level of individual articles.

--
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
Re:A prediction by Anonymous+Brave+Guy · 2007-12-02 12:29 · Score: 2, Insightful

You're imposing blanket assumptions on a specialist niche, which is never a smart thing to do. TFA is talking about news sources, and so is everyone else in this discussion. What anyone else on the web does is pretty much irrelevant here.

And in any case, you're wrong about the value. In the US, which has one of the most liberal fair use regimes in any jurisdiction today, whether the copy being made affects the value for the original is a major question when deciding whether the copy constitutes fair use. In most other places, the restrictions are far tighter anyway, which I suspect explains the settlements referred to in the article.

--
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
Re:A prediction by zantolak · 2007-12-02 13:20 · Score: 2, Interesting

The organized synthesis and presentation of this content is, in itself, useful content. The number of people using news aggregators should have clued you in on this.
Re:A prediction by Anonymous+Brave+Guy · 2007-12-02 14:54 · Score: 1

I wouldn't know. I'm not sure I've ever met someone who actually uses these services. Pretty much everyone I see using the web just goes straight to their favourite source(s).

--
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
Re:A prediction by SnowZero · 2007-12-02 21:58 · Score: 1

Another common argument is that the news sites would die without input from search engines, but again I can't believe this is really true. When I reach lunchtime at work, I do not visit Google to find the BBC News web site, I just type in news.bbc.co.uk. (Actually, I visit the bookmark, but the first time that's what I typed.) Google, or any other news aggregator, is wholly unnecessary to my finding the main news site. Even without that, I could easily have guessed that the BBC News web site could be reached at www.bbc.co.uk/news or news.bbc.co.uk, either of which would have got me there immediately. If what you said above is true, then the news sites could simply deny access to spiders with their robots.txt; It wouldn't affect their business too much, and they could prevent those pesky aggregator sites from nicking their content. Of course the news sites have not done this, which leads me to believe that they gain more in traffic than they lose through aggregators. Otherwise their current actions would not make economic sense. Providing additional revenue for the news providers while at the same time offering features for users is not what I would call "useless". Of course, news sites would like new standards that let them have their cake and eat it too, but the actions of the news providers in the current system seems to be at odds with the world view that you are describing.
Re:A prediction by JuanCarlosII · 2007-12-03 00:02 · Score: 1

And what exactly do you suppose /. is if not a news aggregator?
Re:A prediction by budgenator · 2007-12-03 00:37 · Score: 1

It's actually the other way around, for the simple reason that news aggregators produce no useful content of their own.
The news aggregators produce what I want, links to topics I'm interested in so in return they produce my eyes; if you and the rest of the fucktard in the moron-media can't get that through your thick skulls it's your loss. The majority of what passes for new is the lemmings regurgitating that which wouldn't be worthy of a High School news paper; of course the only one that came even close to getting it right was the one local stringer that the lemmings are following, but what he did was lost because the nationals edited it to death in order to make it "news worthy".
How much of "it's a tense situation at blank, little is know about the blank, the press conference scheduled for blank is late" can you stand?

--
Apocalypse Cancelled, Sorry, No Ticket Refunds
Re:A prediction by Anonymous Coward · 2007-12-03 01:17 · Score: 0

> news aggregators produce no useful content of their own.

Neither do most of the big media sites ( with some exceptions, such
as the BBC ).

Next time you read a news story, check the attribution. It'll most
likely be a wire feed from Reuters or Associated Press.

Many ``news'' companies do without any field correspondents. Even
The Economist doesn't have dedicated correspondents, but relies on
freelancers and double-billers from other papers.
Re:A prediction by Anonymous+Brave+Guy · 2007-12-03 01:21 · Score: 0, Troll

I can't speak for anyone else, but I come to Slashdot for the discussions. I've usually heard any big news long before it hits the front page here, because I also read original sources.

--
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
Re:A prediction by TheRaven64 · 2007-12-03 02:23 · Score: 1

I mostly agree with your post, but I have a question for you. I don't visit the BBC news site directly, I collect their RSS feed, scan that, and read the occasional article in detail. What do you regard as the difference between my doing this and using a service like Google news which does the same kind of thing?

--
I am TheRaven on Soylent News
Re:A prediction by Anonymous+Brave+Guy · 2007-12-03 03:06 · Score: 1

Ouch, whacked with a Troll mod for giving an honest reply to presumably an honest question. Well, sorry Mr Moderator, but it's still true.

Actually, I used to submit quite a few stories to Slashdot when I came across them in other media, but it seemed like most of the time I just got rejected and then someone else's version of the same story got posted a day or two later. I've pretty much stopped submitting new stories from any mainstream sources now, since it seems a lot of people do, but I occasionally still submit interesting ideas I find elsewhere. I don't just copy half the original article text verbatim, though.

In any case, it doesn't take a genius to observe that most stories that get linked from places like Slashdot and Digg come from relatively few sources. Since I follow many of those sources directly anyway, is it really so hard to believe I might see a lot of the original articles before they hit the front page of Slashdot?

--
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
Re:A prediction by Gallowglass · 2007-12-03 03:28 · Score: 2, Insightful

You wrote: "I'm also not convinced they're particularly useful anyway these days..."

And yet, every time I Google, I find what I'm looking for. To my mind, that's useful.
Re:A prediction by pizzicar · 2007-12-03 03:39 · Score: 1

"The very fact that one service could survive quite happily without the other, while the other would die immediately without the first, tells us everything we need to know about the merits and public service benefits of each."

Survive vs. growth are two different things. If we take Google News as an example - I will see a news headline and perhaps see wording that provides a differnt slant on a story. That is motivation for me to click on - and travel to - a news site that I would not otherwise have visited. With out this type of service, I would just be visiting one or two news sites for all of my news - and never know that "another side of the story" exists.

Hrmmm... by superdan2k · 2007-12-02 11:10 · Score: 1

...there's nothing quite like watching the traditional, embattled news sources "innovate" themselves right out of existence. They were slow to respond to the web and didn't understand it when they first did (they've gotten better), and now they're going to ACAP themselves into obscurity. Way to go guys! You're the bleeding edge of reporting!

--
blog |

Yeah, but... by Anonymous Coward · 2007-12-02 11:10 · Score: 1, Interesting

Those things are stupid. Were I Google, I'd put up something on my website that made them consent to my terms, or forgo indexing entirely. I can't blame them for wanting more control, but I don't think they should get it. I don't trust them at all.

Makes no sense by mattwarden · 2007-12-02 11:50 · Score: 2, Insightful

If search engine caching of their content is hurting these publishers, then they would use currently-supported methods to keep crawlers out:

User-agent: *
Disallow: /

Oh, but that's right, they do want to be indexed in search engines because it increases their revenue.

So, what's the problem, again?

Re:Makes no sense by Anonymous Coward · 2007-12-02 23:28 · Score: 0

Oh, but that's right, they do want to be indexed in search engines because it increases their revenue.

So, what's the problem, again?

That they want you to click on the link, but they don't want you to know what's behind the link until after you click on it. If you know what's behind the link, you can make an informed decision that you aren't interested, thus "deprieving them of their rightfull ad-income".

It hasn't occured to them yet, that not showing what you are clicking on will just make you even more likely to skip that link. Afterall, people buy dead tree newspapers without any idea what's inside.

They want to do away with informed choice, and go back to the world of dead-tree newspapers.
Re:Makes no sense by mattwarden · 2007-12-03 08:13 · Score: 1

Interesting point about the newspapers

I'm confused. by etnu · 2007-12-02 12:43 · Score: 1

What is it that they're losing by having this information indexed on Google again? It's just a summary; if they want to read the article, they still have to go to the target site. If Google didn't index their content, they'd get a lot fewer readers. It's not like people are going to Google news and then deciding that they got enough information from the summary to not bother reading the article.

Re:I'm confused. by ScrewMaster · 2007-12-02 12:53 · Score: 1

I think this is more akin to the way television networks play games with commercials in an attempt to reduce channel switching. Rather than have people type in a generic query to a search engine and choose among the many results (which will often include competitors) they'd like to make search less useful, thereby driving customers directly to individual news sites. I mean, from their perspective it's better if you pick "ABC News" to get your news fix, rather than go to Google first and search everybody else too.

It's a hopeless endeavor from the get-go, but this is the same mindset that thought it could stop filesharing with lawsuits. Stupid and shortsighted, and an utter denial of reality.

--
The higher the technology, the sharper that two-edged sword.

Re:Why are you here? A better prediction. by Anonymous Coward · 2007-12-02 12:50 · Score: 0

i love it when you karma whore, willy. it gives me a woody every time.

Re:So they tell you what they don't want you to se by Anonymous Coward · 2007-12-02 13:23 · Score: 0

Hmm, i wonder how long before someone opens a search engine that indexes only what is "hidden"(yeah, really...) by the ACAP settings.

I was curious, so I tried it on a few sites. Interesting stuff. Error logs, admin login pages, cms logins, "test" pages,... I didn't try logging in to any of them, but I do wonder how hard it would be. I'm guessing not very.

So where's the RFC? by Animats · 2007-12-02 13:26 · Score: 1

If these guys want anybody to pay attention, they should submit their protocol as an RFC. Their "standards document" is badly written. It has statements like "Features that are ready for implementation now, but only for use in crawler communication by prior arrangement, are labelled with an amber spot. These represent a minority of extensions for which there are possible security vulnerability or other issues in their implementation on the web crawler side, such as creating possible new opportunities for cloaking or Denial of Service attack." One such problem is that they stuck in a redirect mechanism that directs the crawler to pull data from another domain. Then they put in mealy-mouthed phrases like "It is recommended that, if possible, the URI should normally specify a resource within the crawled resource and not external to it, as this is less likely to present technical and security difficulties to the crawler.". This reads like something from a committee that doesn't have to make it work. They need to formally address the issue of the security scope of a robots.txt file, not hand-wave around the problem.

That's no good. Somebody competent familar with IETF procedures will have to overhaul this.

Car Analogy by Anonymous Coward · 2007-12-02 15:07 · Score: 0

If you run over some old lady, it's not the car's fault. Nor is it the car maker's fault. It's your fault for using the car poorly.

Give 'em what they want... by KwKSilver · 2007-12-02 16:00 · Score: 1

I found out about this story yesterday on (ta-da!) Google News. A little more searching (again via Google) led me to their web site. Interestingly, I could not find any information there about who constitutes the ASCAP membership. The ACAP site lacks a search tool ... (surprise, surprise) so back to ... Google for more searching which eventually leads to this page. No doubt Yahoo or MSN search would have led to the same findings. The Wikipedia article has a short list of the main suspects doubtless there are others like AP.

I just want to know who to add to my /etc/hosts file so that I don't accidentally view any of their sacred content. They don't want me to be able to find their stuff? Fine, I'll be happy to give 'em what they want.

--
If you want your life to be different, live it differently.

Agh the colors! by Draped+Crusader · 2007-12-02 16:17 · Score: 1

This might be because I'm slightly colorblind, but the colors on the ACAP page make my eyes bleed.

Re:Agh the colors! by Raideen · 2007-12-02 17:38 · Score: 1

This might be because I'm slightly colorblind, but the colors on the ACAP page make my eyes bleed.

[visits link] Hey, have you ever heard of induced colorblindness?
Re:Agh the colors! by zippthorne · 2007-12-02 20:04 · Score: 1

It's not you. They appear to have decided that yellow on beige is a good combo for headline text. For good measure, at the bottom of the page, there are some nice big blue arrows for advancing to some kind of following page or something. But they appear to have been lifted from a website with a white background.

--
Can you be Even More Awesome?!

Remove them from the index by bigtangringo · 2007-12-02 16:25 · Score: 1

If they don't like it, remove them from the index. Watch how fast they shut their pie-holes then.

--
Yes, I am a smart ass; it's better than the alternative.

what does google get out of this by aachrisg · 2007-12-02 16:29 · Score: 1

In my opinion, google would be insane to agree to any restriction other than telling the sites "if they don't want to be in google, we let any site opt out already". Google has all of the power - if a site doesn't exist in google, it does not exist.

So that's what they're bitching about? by SanityInAnarchy · 2007-12-02 16:36 · Score: 1

Ok, regardless of the legal fairness, I'd think removing those previews would actually reduce the likelihood of me visiting such a site.

Almost never do I see a Google result and say "Ok, I know all I needed to, not going to click." More often, I see one and say "Gee, looks like that site won't be very helpful, let's move on to the next one." I can only imagine my response would be like that, only more so, towards anyone who could allow Google to index them without allowing Google to summarize them.

--
Don't thank God, thank a doctor!

huh? by wap911 · 2007-12-02 16:49 · Score: 2, Insightful

What do they not understand about *DO NOT CRAWL*? Robots.TXT is just fine. If it ain't broke, don't try to fix it. So now I have to have a .robotaccees to go along with .htaccess?

"Hiding" most of the content in RSS feeds... by RhysU · 2007-12-02 16:52 · Score: 1

...makes me less likely to click through to their real story. Most of them major outlets seem to give only 1 or 2 sentences per feed item. That's so little information that I find I'm not interested in the story. I end up just browsing headlines. They've got to give enough that people want to read the articles. The traditional newspaper editing goal of cut-the-article-off-at-any-sentence-and-its-still-complete is at odds with how and why I seek media coverage these days.

I have a better idea... by SanityInAnarchy · 2007-12-02 17:14 · Score: 1

Let's just not pay attention to them, even that much.

--
Don't thank God, thank a doctor!

Re:So they tell you what they don't want you to se by gbjbaanb · 2007-12-02 17:42 · Score: 2, Funny

I think it would have been better named Content Retrieval Access Protocol.

Complain/lock down with ACAP - Google should drop by Frank+T.+Lofaro+Jr. · 2007-12-02 18:24 · Score: 1

If a site complains or uses ACAP - Google should just drop them.

The Google "site death penalty" - you become (rightfully) irrelevant.

I wish I could set in my Google preferences to exclude sites the use "noarchive" or "nosnippet".

Like those journals that feed Google the whole content but just give surfers a subscription page. Such as Blackwell-Synergy - I keep submitted them to Google's spam page since they do that - in direct violation of Google rules.

--
Just because it CAN be done, doesn't mean it should!

Woops by Anonymous Coward · 2007-12-02 18:34 · Score: 0

This is to clear out a mistake in moderation that I have no idea how to clear out otherwise...

Removing *newspaper* articles from the web ? by Anonymous Coward · 2007-12-02 21:21 · Score: 0

I'm sorry, but I do not quite understand : What is the net value (no pun intended) of yesterdays (let alone yester-weeks) news ?

What do these news-aggregators win by disallowing us, the people, from finding-out what has happened last week/month/year ?

They have pushed their news to as many people they could get it to, but somehow want this news to be gone in a week ? Why ?

Whats the next step ? Newspaper that dissolves in a week ? Special software on a readers computer that will, on their computers, erase all downloaded articles older than a set date ?

Has anyone here actually read the new standard? by vrmlguy · 2007-12-03 00:52 · Score: 1

http://www.the-acap.org/download.php?ACAP-TF-CrawlerCommunications-Part1-V1.0.pdf
http://www.the-acap.org/download.php?ACAP-TF-CrawlerCommunications-Part2-V1.0.pdf

In brief, part 1 extends the robots.txt file, while part 2 extends the robot-related meta-tags. They allow spiders to be identified by both User Agent info and purpose (news, images, reviews). They add an "include" statement that can direct specific search engines to specific files, for example sending googlebot to robots/googlebot.txt; besides reducing bandwidth, this can confine any damage caused by coding errors. They also allow more granularity of indexing: You can specify if data from an old cache copy can be presented to a user, or if only the most recent copy should be used, and you can specify if links, snippets, thumbnails, or full content (i.e. a frame containing the originating site) can be shown to the search engine user. They add better retention controls; you can specify how long an engine should keep information (N days, until YYYY-MM-DD, or just until the next time the spider visits). And finally, they add a crude macro facility, so you don't have to create huge files that repeat themselves.

All in all, I don't see anything that's especially bad, and a lot of it is stuff that arguably should have been in robots.txt from the beginning.

--
Nothing for 6-digit uids?

Re:So they tell you what they don't want you to se by vrmlguy · 2007-12-03 02:35 · Score: 1

Will that someone also ignore the current robots.txt content? Because anyone using ACAP will soon migrate to a simple "Disallow: *" (in an attempt to influence your decision on whether to use the ACAP extensions), and then you'll find yourself indexing dynamically-generated, "infinite" web pages.

--
Nothing for 6-digit uids?

Wrong by Anonymous Coward · 2007-12-03 03:13 · Score: 0

Not all media are opt in.

Radio transmissions, for example. I can receive transmissions from France and Ireland if I use a directional aerial.

And for the internet, it CANNOT WORK unless the default is opt in: how do you get an agreement to view content? Ask. But you can't ask, you're browser is asking. So how does it ask? Make a request, but that request is copyrighted and the respons (even !NO!) is copyrighted too. Then again, you don't get "the" copy either. Each ISP will have to make a copy. So how do they do it? Then there's your local cache which reduce redundant traffic (and therefore helps the ISP, the provider and the consumer) but require agreement first.

So how can it work if it is a default "opt in"?

If you want it somehow different, DON'T USE HTTP. Require logins, use (S)FTP and make your own protocol up will all allow you to work with opt-in.

So why index any of it? by Anonymous Coward · 2007-12-03 03:58 · Score: 0

What if Google says "OK, if you use those extensions, you're not getting indexed"? Copipreese tried to sue Google for NOT including their content. Is there anything that says "you MUST index"?

Google Forensics by HTH+NE1 · 2007-12-03 07:01 · Score: 1

I'd like to chime in here and mention that I have used Google to read a story that had been dropped from a news site. Google didn't provide a cache link, the site refused to acknowledge that it had ever published the story (they sent it down the memory hole), but I had a couple phrases quoted elsewhere that I wanted to check context against.

So I did multiple Google searches looking for phrase segments, each one getting me one or two more words before or after the phrase. Eventually I was able to reconstruct two or three paragraphs. It took a few more searches to determine their proper order.

I'm wary of giving any site the ability to prevent such Google forensics automatically.

--
Oh, say does that Star-Spangled Banner entwine / The myrtle of Venus with Bacchus's vine?

You can't unprint a book by Anonymous Coward · 2007-12-03 08:20 · Score: 0

so if you decide that, although you don't want it in the PD, you can't decide to unprint it so that it won't.

When you print a book, you cannot stop me lending it, whether you wish it or not.

Now, when you print it on the internet, you've lost most of your control. An example would be "if there areextentions to robots.txt, don't read ANYTHING"). Copiepresse SUED Google for NOT indexing their site.

So how is ASCAP going to help?

Why is it better than "require a login" or "don't publish in a PUBLIC PLACE"?

Metadata is information by lennier · 2007-12-03 11:39 · Score: 1

"The thing is, they're completely wrong. It's actually the other way around, for the simple reason that news aggregators produce no useful content of their own."

That's not actually true. The act of aggregation itself creates information: it brings news articles -- and news sources - to our attention which we wouldn't otherwise know about. What they create is metadata, and that's hugely valuable.

In a perfect world where all sites used and respected HTML meta tags, or Dublin Core markups or something, and did so thoroughly, sensibly and never abused them or lied about the categorisation of content, perhaps we could get by without needing third-party search and aggregation services. Maybe. And in a perfect world where sites didn't store information in silly non-browseable formats like, eg, PDF instead of HTML, we wouldn't need things like Google Cache to make them halfway readable. But we don't, and that's why aggregation exists.

I think an aggregator shouldn't strip out all links back to the original source, and should make it clear that there is more at the site to be investigated. But don't they do that anyway? One of the reasons why I don't read blogs via RSS yet is that I feel claustrophobic if I can't see the surrounding context of a blog: the skin, the about page, the comments (especially the comments). I guess it amazes me that there would be people who would *only* read, say, Google News or antiwar.com headlines and not read the full article.

--
You are not a brain: http://books.google.com/books?id=2oV61CeDx-YC

Slashdot Mirror

Publishers Seek Change in Search Result Content

181 comments