Facebook Crawler Speaks Back

← Back to Stories (view on slashdot.org)

Posted by CmdrTaco on Tuesday April 6, 2010 @01:42AM from the everybody-litigate-now dept.

Last week we ran a story about Facebook suing to get a crawled dataset offline. This week we have a bit of a response written by Pete Warden, the guy who actually did the crawling. He followed robots.txt, and then Facebook's lawyers went after him. It's actually a quite interesting little tale and worth your time.

11 of 317 comments (clear)

Min score:

Reason:

Sort:

Pretty naive by elrous0 · 2010-04-06 01:43 · Score: 5, Insightful

Did this guy really think he could just give away the data that Facebook sells (or intends to sell) to third parties and NOT have them sue him for it? It's no secret that the business model of most of the social sites and big search engines factor in the massive amounts of data they collect on users as a major corporate asset, to be used internally for data mining and also sold (supposedly after being anonymized) to advertisers and other third parties. It takes a babe in the woods to think he can just waltz in and take that away with a "But your robot.txt didn't say I *couldn't* do it" defense, without expecting a big legal fight.
Is the guy in the right? Probably. Would he have a case? Probably. Does either of those facts matter if he doesn't have the big $ needed to hire lawyers and fight through several courts? Nope.

--
SJW: Someone who has run out of real oppression, and has to fake it.
1. Re:Pretty naive by Anonymous Coward · 2010-04-06 01:46 · Score: 5, Insightful
  
  If he's in his right, but not having as much money as a big cooperation means he'll lose anyway, then your U.S. court system is broken. Please fix it.
2. Re:Pretty naive by qoncept · 2010-04-06 01:51 · Score: 5, Interesting
  
  matter if he doesn't have the big $ needed to hire lawyers
  
  Thank you. I ran an open source project for a few years and came home one night to find to find that my webhost had taken its site down after being contacted by a company with a similar name. The company claimed they'd tried to contact me, explained how my project was causing them harm, but the simple fact of the matter was that my project's name did not infringe on theirs.
  
  I ended up renaming the project. I've told the story dozens of times, and the response is always the same. "That's BS! They can't do that! Go to court!" People don't understand that $20 a month in unmanaged Google ads doesn't cover lawyers the same way that company's actual paying customers do.
  
  --
  Whale
3. Re:Pretty naive by julesh · 2010-04-06 02:15 · Score: 5, Interesting
  
  It takes a babe in the woods to think he can just waltz in and take that away with a "But your robot.txt didn't say I *couldn't* do it" defense, without expecting a big legal fight.
  Yes. Apart from anything else, he's just about entirely missing Facebook's point. Facebook don't give a shit how he accesses their site; this has nothing to do with the fact that he spidered it in a way that their robots.txt file allows, and everything to do with the fact that he was *redistributing their data* without consent.
  Now, the question becomes whether what he was distributing falls under fair use. This is a very tricky question, and has nothing to do with how he acquired it.
4. Re:Pretty naive by Pharmboy · 2010-04-06 02:18 · Score: 5, Interesting
  
  American justice might be blind, but it know what money smells like. One more reason why we need judicial reform to prevent abuses like this. Of course fighting it wouldn't be worth it, as even if you won, your "winnings" would have only been the ability to continue using the name. Another good example is http://www.nissan.com, where he actually fought and won, at a great price. His name is Nissan, and his computer business and name existed back when the cars were called "Datsun", but they sued anyway. This is another one of those "We are bigger than you, thus more deserving of the domain name than you" cases.
  
  --
  Tequila: It's not just for breakfast anymore!
5. Re:Pretty naive by Lumpy · 2010-04-06 02:36 · Score: 5, Insightful
  
  WE have the best court system money can buy!
  
  --
  Do not look at laser with remaining good eye.
Mark Zuckerberg by prayag · 2010-04-06 01:49 · Score: 5, Interesting

Mark Zuckerberg is the most unethical guy in the industry today. As is obvious by the origins of Facebook, his infamous hacking of the journalists passwords during the the-facebook era and countless other fiascoes that come to news from time to time. Everyone who has ever dealt with him says have bad things to say about him.
If he is the face of the next generation entrepreneurs, then god saves the industry.
Re:obviously this is abusive by ikoleverhate · 2010-04-06 02:03 · Score: 5, Interesting

how about if he rejigged his crawler to get the data from the google cache instead? So he'd never get anything from facebook or enter into any implied agreement with them.
Ooo, deja vu by lxt · 2010-04-06 02:10 · Score: 5, Insightful

It's sort of ironic that Facebook is trying to stop someone crawling public profiles on their site, because that's exactly what Mark Zuckerberg did while he was at Harvard (I was a grad student in the CS department at the time).
Pre-Facebook, Zuckerberg created a site that let Harvard students compare each other, a bit like Hot or Not. Obviously nobody was going to go to a site that wasn't populated with their classmates, so he basically crawled the websites of the various residential houses that put their students info online (but behind passwords and auth) and copied it into his own site.
He actually got into a fair bit of trouble for this, and ended up being sent to Harvard's ad-board for discipline (I think he got put on probation, but I'm not entirely sure).
The key difference here is that this guy actually did everything by the book and followed robots.txt, whereas Mark Zuckerberg didn't.
Re:Arachnophobia by OnlyJedi · 2010-04-06 02:12 · Score: 5, Informative

From the Statement of Rights and Responsibilities, Section 3 "Safety":

2. You will not collect users' content or information, or otherwise access Facebook, using automated means (such as harvesting bots, robots, spiders, or scrapers) without our permission.
The question then becomes how enforceable is the agreement? Sure, if he has an account Facebook can close it, but if he is just accessing Facebook without an account do they have a case? Last I saw you can browse parts of profiles without being logged in, and without ever agreeing to any terms.
Facebook's privacy policy by whencanistop · 2010-04-06 02:22 · Score: 5, Informative

Facebook's privacy policy says:

“Everyone” Privacy Setting. Information set to “everyone” is publicly available information, may be accessed by everyone on the Internet (including people not logged into Facebook), is subject to indexing by third party search engines, may be associated with you outside of Facebook (such as when you visit other sites on the internet), and may be imported and exported by us and others without privacy limitations. The default privacy setting for certain types of information you post on Facebook is set to “everyone.” You can review and change the default settings in your privacy settings. If you delete “everyone” content that you posted on Facebook, we will remove it from your Facebook profile, but have no control over its use outside of Facebook.
I'd also like to point out in their terms:

When you publish content or information using the "everyone" setting, it means that everyone, including people off of Facebook, will have access to that information and we may not have control over what they do with it.