Facebook Crawler Speaks Back
Last week we ran a story about Facebook suing to get a crawled dataset offline. This week we have a bit of a
response written by Pete Warden, the guy who actually did the crawling. He followed robots.txt, and then Facebook's lawyers went after him. It's actually a quite interesting little tale and worth your time.
Did this guy really think he could just give away the data that Facebook sells (or intends to sell) to third parties and NOT have them sue him for it? It's no secret that the business model of most of the social sites and big search engines factor in the massive amounts of data they collect on users as a major corporate asset, to be used internally for data mining and also sold (supposedly after being anonymized) to advertisers and other third parties. It takes a babe in the woods to think he can just waltz in and take that away with a "But your robot.txt didn't say I *couldn't* do it" defense, without expecting a big legal fight.
Is the guy in the right? Probably. Would he have a case? Probably. Does either of those facts matter if he doesn't have the big $ needed to hire lawyers and fight through several courts? Nope.
SJW: Someone who has run out of real oppression, and has to fake it.
Mark Zuckerberg is the most unethical guy in the industry today. As is obvious by the origins of Facebook, his infamous hacking of the journalists passwords during the the-facebook era and countless other fiascoes that come to news from time to time. Everyone who has ever dealt with him says have bad things to say about him.
If he is the face of the next generation entrepreneurs, then god saves the industry.
how about if he rejigged his crawler to get the data from the google cache instead? So he'd never get anything from facebook or enter into any implied agreement with them.
Pre-Facebook, Zuckerberg created a site that let Harvard students compare each other, a bit like Hot or Not. Obviously nobody was going to go to a site that wasn't populated with their classmates, so he basically crawled the websites of the various residential houses that put their students info online (but behind passwords and auth) and copied it into his own site.
He actually got into a fair bit of trouble for this, and ended up being sent to Harvard's ad-board for discipline (I think he got put on probation, but I'm not entirely sure).
The key difference here is that this guy actually did everything by the book and followed robots.txt, whereas Mark Zuckerberg didn't.
From the Statement of Rights and Responsibilities, Section 3 "Safety":
2. You will not collect users' content or information, or otherwise access Facebook, using automated means (such as harvesting bots, robots, spiders, or scrapers) without our permission.
The question then becomes how enforceable is the agreement? Sure, if he has an account Facebook can close it, but if he is just accessing Facebook without an account do they have a case? Last I saw you can browse parts of profiles without being logged in, and without ever agreeing to any terms.
I'd also like to point out in their terms: