Vertical Search Engines and Copyright

← Back to Stories (view on slashdot.org)

Vertical Search Engines and Copyright

Posted by kdawson on Tuesday July 10, 2007 @08:20AM from the in-the-aggregate dept.

An anonymous reader writes "I am a big fan of Oodle, the online classifieds aggregator. I was disheartened when Craigslist announced that they would block Oodle from their site in late 2005 (old link), as I find their service very handy. I came across this page at the site of an aggregator of freelance job openings that summarizes the arguments around the legality of meta search engines and mashup-like sites and I found myself wondering if Oodle could have avoided the ban. There is an interesting argument there that seems to undermine copyright claims of user-generated content compilations. Are mashups legal? How does this affect sites like Digg or YouTube?"

39 of 62 comments (clear)

Min score:

Reason:

Sort:

Content Aggregation and Mashups by blaster151 · 2007-07-10 08:26 · Score: 5, Interesting

In content aggregation lies all of my excitement about the future of the web (if people are allowed to continue being innovative and aren't prevented by heel-dragging by legal departments).

I don't even care if the aggregation happens server-side or browser-side. I want to be able to view a book product page on Amazon and click a "place local library hold" button. I want to be able to view my LiveJournal Friends page and have a superimposed queue and "recently watched" displays for those folks who are also my Netflix friends. Or current weather reports for those friends' locations. Fun stuff. I want to be able to stumble across an old news story and have a "there are 117 comments when this story was posted to Slashdot five months ago" notification.

There is so much potential here for crossover - and it's all data that already exists! Crosslinking through simple knowledge of "which person on one service is which person on another service" - and "which product on one service is which product on another service" - would open so many doors. I hope legal departments don't keep preemptively closing them. To me, this is what would excite me if it were true about "Web 2.0" - beyond just simple pretty, AJAX-enabled user interfaces. Although those are cool, too.
1. Re:Content Aggregation and Mashups by Phroggy · 2007-07-10 08:28 · Score: 3, Informative
  
  Crosslinking through simple knowledge of "which person on one service is which person on another service" - and "which product on one service is which product on another service" - would open so many doors. Wasn't this more or less the dream of Microsoft Passport?
  
  --
  $x='S24;r)>63/* h@<5+oZ)32"5cz';$me='phroggy'x$];
  $x=~y+ -xz+\0-Tx+;print$_^chop$me for split'',$x;
2. Re:Content Aggregation and Mashups by __aaabsi3154 · 2007-07-10 08:38 · Score: 5, Insightful
  
  I'm glad you don't care where or how the aggregation happens, but who is going to pay the bills? If you use Amazon to find local books, what does Amazon get out of it? I think the real winner will not be the person who first creates all this aggregation, but the person who does it all in a way that allow profits to be shared.
  
  But this sharing is where problems arise, as everyone thinks they're entitled to a larger share of the cash than the next person...
3. Re:Content Aggregation and Mashups by Threni · 2007-07-10 08:45 · Score: 1
  
  > if people are allowed to continue being innovative and aren't prevented by heel-dragging by legal departments
  
  If they were innovative there wouldn't be a problem. Using other people's copyrighted material is likely to cause problems though, right?
  
  > There is so much potential here for crossover - and it's all data that already exists!
  
  Yes, but it's *not your data*!
4. Re:Content Aggregation and Mashups by Applekid · 2007-07-10 08:48 · Score: 2, Insightful
  
  Only under Microsoft's model, all services are owned by them. ;)
  
  --
  More Twoson than Cupertino
5. Re:Content Aggregation and Mashups by blaster151 · 2007-07-10 08:49 · Score: 1
  
  What bills?
  
  I can already perform much of the above aggregation myself - manually and for free.
  
  If you're talking about someone investing development time for a cool browser plug-in or aggregator website that automated it for me, though . . . well, I know that I for one wouldn't mind kicking in some $$$ for something that useful.
6. Re:Content Aggregation and Mashups by blaster151 · 2007-07-10 08:52 · Score: 1
  
  In the sense that it's been served up to me, for free, I consider some of the ingredients of the mashups I described to be "my data" - my Netflix and Blockbuster queues, my friends lists on blogging sites (along with the entries they've written), etc. I'm not suggesting using some backdoor to take stuff merchants want to sell, and make it free.
7. Re:Content Aggregation and Mashups by ferd_farkle · 2007-07-10 09:02 · Score: 1
  
  You've been listening to that Berners-Lee fellow, haven't you.
8. Re:Content Aggregation and Mashups by DerekLyons · 2007-07-10 09:04 · Score: 1
  
  I'm glad you don't care where or how the aggregation happens, but who is going to pay the bills? If you use Amazon to find local books, what does Amazon get out of it?
  
  Precisely. Alone the same lines, the OP blames the 'legal departments' - he doesn't care about other people's rights, or about their ability to pay their bills. He just wants what he wants, now, Now, NOW!
  
  Other people and their rights and interests be dammed.
9. Re:Content Aggregation and Mashups by raehl · 2007-07-10 09:45 · Score: 1
  
  but who is going to pay the bills?
  
  You are. You install a browser plugin that adds the button to the Amazon.com pages that you view - it just takes the ISBN number from the Amazon page and matches it up with the ISBN number at the library and adds the button for you.
  
  Maybe you have to pay for that plugin, but quite likely it'll be a free plugin just like many plugins currently are.
  
  --
  paintball
10. Re:Content Aggregation and Mashups by fbartho · 2007-07-10 10:15 · Score: 3, Insightful
  
  I think he's saying that Amazon and others get value by pushing their branding, and ads in your face when you use them. Some percentage of users actually generate revenue even though they were only contacted through these free options. Mashups, especially vertical search engines, can cause problems for the providers, because they let a user who currently uses that free stuff and is occasionally swayed by the ads, still get the value (and more) out of the free stuff, without providing any value, AND it lets many more people who didn't use the free data, profit off amazon's grace AND often suck up their outbound bandwidth much more than if the service didn't exist. Amazon's *free data* suddenly lost much of it's value to them, while also suddenly increasing in cost.
  
  --
  Gravity Sucks
11. Re:Content Aggregation and Mashups by PopeRatzo · 2007-07-10 10:55 · Score: 1
  
  The fact that there's even a question about the legality of content aggregation shows just how useless our current intellectual property law has become. The discussion we should be having is how should we replace it? I think it's too late to try to fix it. There are flaws in the underlying model.
  
  --
  You are welcome on my lawn.
12. Re:Content Aggregation and Mashups by PopeRatzo · 2007-07-10 13:00 · Score: 1
  
  so speaks someone who has not invested millions in building up a website community or putting together original content. You might want information X to appear on website Y, but then, you didn't invest your lifes savings in the development of website X's content did you?
  If you've done either of these great things, why not claim a name for yourself, Mr. AC? Maybe a little self-promotion would help your courageous investment of millions and the brilliant original content you have created? Oh, you haven't done these things, you say? Well then, why do you feel my opinion is worth any less than your anonymous comment?
  
  --
  You are welcome on my lawn.
13. Re:Content Aggregation and Mashups by damelang · 2007-07-11 07:55 · Score: 1
  
  I'm glad you don't care where or how the aggregation happens, but who is going to pay the bills? As technology (both hardware and software) progresses, these bills might become so low that it won't really matter much any more.
  
  Perhaps at that point we'll have the whole of social computing running on an open, distributed, p2p-like system so that we all share the "bills" without even thinking about it. Or are we going to continue with this walled-garden approach to user-generated content? "Open APIs" like the Facebook API aren't open enough, IMO, as Facebook is the gatekeeper and still has the final say about what can be done with the user's data.
  
  We can, and eventually will, do better.
Digg and YouTube are mashups? by RingDev · 2007-07-10 08:29 · Score: 3, Informative

Now, maybe I'm just not keen on the latest batch of synergistical leet speak, but aren't Digg and YouTube user contribution driven aggragators? Isn't the key feature of a Mashup that it uses functionality from different web services to create a new set of functionality? Say like tieing CNN's RSS feed to Google Maps to Flicker to get an interactive graphical, geographical, news browsing interface.

Or am I just out of touch?

-Rick

--
"Most people in the U.S. wouldn't know they live in a tyrannical state if it walked up and grabbed their junk." - MyFirs
1. Re:Digg and YouTube are mashups? by dunezone · 2007-07-10 08:41 · Score: 1
  
  No man, your way in touch with everything. Therefor you are a witch, prepare to be burned at the stake.
2. Re:Digg and YouTube are mashups? by OverlordQ · 2007-07-10 08:48 · Score: 1
  
  No, calling anything web-related a 'mashup' is a horrible bastardization of the word.
  
  --
  Your hair look like poop, Bob! - Wanker.
3. Re:Digg and YouTube are mashups? by RingDev · 2007-07-10 09:23 · Score: 2, Funny
  
  Damn it. Not again!
  
  -Rick
  
  --
  "Most people in the U.S. wouldn't know they live in a tyrannical state if it walked up and grabbed their junk." - MyFirs
4. Re:Digg and YouTube are mashups? by Mean+Variance · 2007-07-10 09:51 · Score: 1
  
  No man, your way in touch with everything. Therefor you are a witch, prepare to be burned at the stake.
  If he weighs the same as a duck, he's made of wood. And therefore ...
Re:One website's self-justifying legal disclaimer by Lucas123 · 2007-07-10 08:45 · Score: 1

Amen
Ughhh, I can't freaking stand "mashup"! by Otter · 2007-07-10 08:50 · Score: 4, Insightful

I found myself wondering if Oodle could have avoided the ban.
If I'm understanding correctly, craigslist has terms of service, and Oodle was systematically violating them. That's their right, whether there's a formal copyright violation or not.
I'd never heard of Oodle, but craigslist is notoriously easygoing and their terms (you can run searches but not mirror the whole damn thing) seem reasonable, so I think the way Oodle could have avoided the ban is by not pissing Craig off.

--
What I'm listening to now on Pandora...
1. Re:Ughhh, I can't freaking stand "mashup"! by Anonymous Coward · 2007-07-10 10:20 · Score: 1, Informative
  
  Craig is still the chairman. Look at their site.
mashup's by jshriverWVU · 2007-07-10 08:52 · Score: 2, Interesting

I've wondered how mashups would survive. Back in the 90's companies where suing others for just linking to their site. Let alone blatant copying of data feeds. It's a tricky situation.
If it's a site that is funded strictly from ads, then they have a lot to lose by others ripping their content. But at the same time mashups are a wonderful way of getting a lot of similar info together so it's a convenience to the end user.
Re:I hate vertical search engines by dotpavan · 2007-07-10 08:56 · Score: 3, Funny

I am Japanese, you insensitive clod!
Attribution and Citation by blueZhift · 2007-07-10 08:59 · Score: 3, Insightful

I don't know if mashups are legal in the strictest sense, but I do have an idea how I would want it to work. Academic publications are impossible to produce without citing the work of others. That's how research works. Information that did not originate with the author is attributed to its respective source(s). No muss, no fuss, usually, and there are accepted conventions for how this is done. Right now I don't think the web has any such accepted conventions, but it should. Practically speaking, it would be impossible to close down all aggregation sites anyway, so the best course of action, imho, would be to develop standards for citing information that comes from other sources. While these still can't be enforced 100%, peer pressure should at least give people the idea that citing sources is a good thing.

--
To the making of books there is no end, so let's get started
1. Re:Attribution and Citation by DGolden · 2007-07-10 10:47 · Score: 1
  
  Well that sounds fairly sensible. It'll never catch on. ;-)
  
  --
  Choice of masters is not freedom.
2. Re:Attribution and Citation by stubear · 2007-07-10 11:26 · Score: 1
  
  That only covers half the problem. How do you keep others from profitting from your work? Why shoudl another sit ebe allowed to sell ad space on their site and generate revenue from content I produced without sharing that revenue with me?
I am making something similar by Safiire+Arrowny · 2007-07-10 09:32 · Score: 2, Interesting

I am making something similar to create notifications for posts on craigslist right now. It is written in Ruby, and it basically enters the sections you specify on craigslist, and downloads and stores the last 100 postings into an Sqlite3 database.

Then, as a human might do if he were obsessive, checks the section indexes for updates say every 10 minutes and incrementally stores new posts.

The data in sqlite is then indexed by the ferret search engine library, so that it can perform searches on the post content and uses gtk2's libnotify to pop up a notification bubble if it has found anything you previously said you were interested in.

I have not gotten banned in any way from craigslist, and I don't expect to be, since beyond the initial download of the sections, it behaves no different than an obsessive human who might be looking at 10 pages every X minutes. With this, I would be necessarily one of the first people to notice anything on the site that I'm interested in.

I will probably release this on my site for everyone. I'm aware it's against the terms of service to completely mirror the entire site, but does this count as mirroring? Can it be deemed similar to greping your firefox cache, or personal mirroring and indexing?

I know I'm sure as hell going to use it, that's why I made it, but it is an awkward feeling that if I give it away for free and people liked it, that I could get into some kind of trouble.
1. Re:I am making something similar by mokumegane · 2007-07-11 01:46 · Score: 1
  
  I'd say the best way to find out if it's against Craig's list ToS/rules/ w/e is to covertly ask them about the gray areas of their rules. Try and describe your program without saying it's your program. I personally don't see a problem with it because as I understand, it doesn't mirror... it searches and finds what you want, then gives it to you. Some items on the list isn't mirroring the whole list. But hey, I'm not Craig...
First Hand Experience by w0lver · 2007-07-10 09:38 · Score: 4, Interesting

I have experience at two companies that did site aggregation. First, with a company that did travel deals but searching other sites and the next was a job site that did the same. Searching and presenting a summary with link to the real live content is legal. Taking the content and re-purposing even with credit is illegal. So as an example, with a travel sight, searching all the airlines, Expedia and so on, and displaying links with prices is valid. However, showing the flights and prices without links and then booking it in the background never displaying the site, illegal. We had a number of companies that tried to sue us, we send over legal opinions and case history on the topic, the suits would disappear. However, we did have a few sites that blacklisted our IPs, tried to break our scraper, and post nasty things about us on other sites.
Sounds similar by neoform · 2007-07-10 09:46 · Score: 1

I wonder what the rules are surrounding my site..

Then again, news = current events and current events are not copyrightable..

--
MABASPLOOM!
1. Re:Sounds similar by coaxial · 2007-07-10 20:25 · Score: 1
  
  Then again, news = current events and current events are not copyrightable. Perhaps not, but reports about the current event are, and always have been.
2. Re:Sounds similar by neoform · 2007-07-11 02:27 · Score: 1
  
  Yes, but the copyright rules around reporting on current events are different than regular copyrighted material.
  
  --
  MABASPLOOM!
Yahoo Pipes and Craigslist by Mean+Variance · 2007-07-10 09:56 · Score: 1

I have found Yahoo Pipes to be an indispensable companion to Craigslist RSS feeds. I can plop in feed from say Fresno and SF Bay, search with positive, negative, and grouped searches, and restuff that back into a new RSS feed.
Re:One website's self-justifying legal disclaimer by antic · 2007-07-10 10:30 · Score: 3, Insightful

"I came across this page..."

Doesn't the submitter mean "I wrote this page and thought I could get it on /. for some free publicity..."?

--
'Thats they exact same thing a banana wrench monkey.'
Legality, Politeness and De facto Standards by logicnazi · 2007-07-10 11:24 · Score: 1
As an aside let me first just say this is a terrible slashdot posting on an interesting subject. The linked article is nothing but an ad (well about page) for yet another job search company. Kudos to their marketing team for getting on slashdot though.

Anyway the comments so far seem to be blurring together several important but very different notions.
1. The legality of crawling another companies publicly available web site and sucking down their content
2. The legality of republishing that content in some manner, either as snippets in a search or in some other fashion
3. The unfairness or harm that one might inflict on another company by doing 1 or 2
All of these are interesting difficult question but let me say a few words about each in turn.

--
For #1 I am reasonably confident that the courts would find crawling that was so resource intensive that it effectively amounted to an DOS attack was banned but it's unclear where this line would be drawn. For instance is crawling that just causes a noticeable slowdown to other users enough to place one in this category? Does the size of the website suffering the slowdown matter or how frequently it happens? It would be unfortunate if archivists were forced to let any owner of a public page opt out because they don't know whose pages are being hosted over 56k modems. A good resolution to this problem will most likely have to await significant agreement on a defacto set of rules for playing nice that congress could then baptize into law. At the moment so long as a reasonable person wouldn't call you a DOS attack your probably safe in regards to the pure server load issue (though IANAL).

A more interesting question here is whether someone crawling your site is bound to follow your terms of service. Those silly little "you are not a member of law enforcement or the RIAA" access requirements have not held up in court suggesting that a totally open website like craigslist can't demand you accept it's terms of use just to crawl it (and your bot surely didn't sign a contract).
---

#2 gets a bit more tricky because now we are talking about copyright law. Obviously if you merely duplicate all their work and host it on your site you will have to pay up when they sue. However, it seems clear that in US courts a transformative use, like creating a search engine, that only displays small snippets of the original work is in the clear. True meta-search engines that repackage the search results of a few search engines seem to be on more shaky ground. So while IANAL it seems to me that if Oodle had been indexing craigslist as one site among many they would have been able to (eventually) win a lawsuit.

--
#3 is where the real action is. While you can probably legally get away with being a dick to some websites practically a de facto standard of good manners for web crawling and indexing is very important. Not only do we need a generally accepted sense of what is fair before we can pass the right laws as a practical matter if you don't comply with the de facto standards you will suffer. Once there is an accepted standard of behavior, like robots.txt, companies that flagrantly disregard it will find the hosts they are trying to crawl entering into an arms race with their crawling software. Oodle may have been in the legal right but even if so the practical difficulties with battling craigslist's web server team may have made it an unattractive prospect. On the other side of the table if you don't place nice with the bots and let them index fairly you may find your site delisted from Google and similar search engines.

Unfortunately it is totally unclear what the right standard in this area should be. Most people agree that the search should normally send people back to the original page but when is it okay to cache? It wouldn't be cool to copy all the posts from craigslist and republish them on your own site (almost certainly illegal) but what about copying all the data displayed in
--
If you liked this thought maybe you would find my blog nice too:
taking other people's content and adding more ads. by microcars · 2007-07-10 11:32 · Score: 1

another site: ABCFREE.COM
used to do the same thing, but they would stick Google Ads in between the actual scraped content so you were more inclined to accidently click a Google Ad than the Classified that you really wanted to see.
ABCFREE.COM seems to have lost their Google Ad account because of this and then I guess it was not worth scraping Craigslist anymore because the site has "down for maintenence" page up now for quite some time.
How many other sites had a business plan like this based on scraping Craigslist and sticking up Google Ads?
Oodle may be a good site, but it appears that many other sites decided to do the same thing around the same time span.
If I was running Craigslist I would wonder why the hell all these other sites were sucking our bandwidth and content and I would cut them off too.

--
I like microcars
Mashups? by Shinra · 2007-07-10 13:21 · Score: 1

First thought that came to my mind were Artists from clearly
different genres of music collaborating for a song, such as
Eminem and Elton John, or Nelly and Tim McGraw, or when a producer
of a Mix-tape samples various older songs from different genres and makes
some sort of a dance mix, or even an entirely new song.

Those would be cool to see more of.
Job boards are ruthless by BigBrownChunx · 2007-07-11 20:41 · Score: 1

Is it just me, or are job boards the worst offenders of "please don't use our content"? This has happened recently in Australasia with the takedown of jobby.co.nz and the legal threats of seek.com.au to myspider.com.au (blogged about at http://www.engageonline.co.nz/blog/?p=84).

What really miffs me, is how the job boards can say they "own" the content, when actually, it's been posted by other people on these sites and is really their content.