Facebook Crawler Speaks Back
Last week we ran a story about Facebook suing to get a crawled dataset offline. This week we have a bit of a
response written by Pete Warden, the guy who actually did the crawling. He followed robots.txt, and then Facebook's lawyers went after him. It's actually a quite interesting little tale and worth your time.
Did this guy really think he could just give away the data that Facebook sells (or intends to sell) to third parties and NOT have them sue him for it? It's no secret that the business model of most of the social sites and big search engines factor in the massive amounts of data they collect on users as a major corporate asset, to be used internally for data mining and also sold (supposedly after being anonymized) to advertisers and other third parties. It takes a babe in the woods to think he can just waltz in and take that away with a "But your robot.txt didn't say I *couldn't* do it" defense, without expecting a big legal fight.
Is the guy in the right? Probably. Would he have a case? Probably. Does either of those facts matter if he doesn't have the big $ needed to hire lawyers and fight through several courts? Nope.
SJW: Someone who has run out of real oppression, and has to fake it.
I might be alone here but spiders revolt me to a point where I simply respect them and leave them alone.
But that said, Google operates a spider, pretty much. So we have to look at any potential spider on the internet like we might look at Google. If he followed the Robots.txt as Facebook set it up and he didn't try to misunderstand it, then there isn't anything they can do. Although, I'm pretty sure the Facebook EULA says you can't spider them so he's SOL anyway if that's the case. This should be a long and drawn out case unless there is a settlement.
Facebook is ripe. People put up EVERYTHING about themselves on there. I never accept a friend request unless I know the person and I offer a challenge question often. If it's not responded to adequately, I simply ignore them. But in the end there isn't much you can do. If you put it on Facebook -- consider it public, like if it was in the phone book.
The dangers of knowledge trigger emotional distress in human beings.
this is what the guy should do:
1. engage the lawsuit
the downside is financial exposure. so incorporate your work in such a way that it can't hit your personal finances. the upside is massive exposure. you will achieve some level of fame: the guy who finally gave the robots.txt convention a legal status quo. this will help you professionally, as well as make your life story
2. whine to google
you are completely right that google shouldn't have to get permission every time it wants to crawl the site. therefore GET GOOGLE TO DEFEND YOU
intellectual property law is philosophically incoherent. it is your moral duty to ignore it or sabotage it
Not really ballsy considering he didn't actually let Facebook's challenge of "The only legal way to access any web site with a crawler was to obtain prior written permission" go to court. Maybe he should have gone to the EFF for help as the repercussions of a judge actually deciding in Facebook's favor would have been devastating to the web.
Reviewing just the first hour of video games.
Pre-Facebook, Zuckerberg created a site that let Harvard students compare each other, a bit like Hot or Not. Obviously nobody was going to go to a site that wasn't populated with their classmates, so he basically crawled the websites of the various residential houses that put their students info online (but behind passwords and auth) and copied it into his own site.
He actually got into a fair bit of trouble for this, and ended up being sent to Harvard's ad-board for discipline (I think he got put on probation, but I'm not entirely sure).
The key difference here is that this guy actually did everything by the book and followed robots.txt, whereas Mark Zuckerberg didn't.