Facebook Crawler Speaks Back
Last week we ran a story about Facebook suing to get a crawled dataset offline. This week we have a bit of a
response written by Pete Warden, the guy who actually did the crawling. He followed robots.txt, and then Facebook's lawyers went after him. It's actually a quite interesting little tale and worth your time.
Did this guy really think he could just give away the data that Facebook sells (or intends to sell) to third parties and NOT have them sue him for it? It's no secret that the business model of most of the social sites and big search engines factor in the massive amounts of data they collect on users as a major corporate asset, to be used internally for data mining and also sold (supposedly after being anonymized) to advertisers and other third parties. It takes a babe in the woods to think he can just waltz in and take that away with a "But your robot.txt didn't say I *couldn't* do it" defense, without expecting a big legal fight.
Is the guy in the right? Probably. Would he have a case? Probably. Does either of those facts matter if he doesn't have the big $ needed to hire lawyers and fight through several courts? Nope.
SJW: Someone who has run out of real oppression, and has to fake it.
Stupid, but ballsy. Gotta give credit where it's due.
Living With a Nerd
Mark Zuckerberg is the most unethical guy in the industry today. As is obvious by the origins of Facebook, his infamous hacking of the journalists passwords during the the-facebook era and countless other fiascoes that come to news from time to time. Everyone who has ever dealt with him says have bad things to say about him.
If he is the face of the next generation entrepreneurs, then god saves the industry.
The guy's work looks somewhat interesting. I don't see why he can't just make it a facebook app or something that just happens to crossover onto the rest of the internet as well, maybe that would have helped him fly under their radar if it was seen as something that enhanced facebook.
But seems like his problem all along was lack of publicity, which /. will surely help with.
That said, call me old-school, but I've had more fun with things like ircstats. So I'm mostly still waiting for this new social crap to catch up.
I might be alone here but spiders revolt me to a point where I simply respect them and leave them alone.
But that said, Google operates a spider, pretty much. So we have to look at any potential spider on the internet like we might look at Google. If he followed the Robots.txt as Facebook set it up and he didn't try to misunderstand it, then there isn't anything they can do. Although, I'm pretty sure the Facebook EULA says you can't spider them so he's SOL anyway if that's the case. This should be a long and drawn out case unless there is a settlement.
Facebook is ripe. People put up EVERYTHING about themselves on there. I never accept a friend request unless I know the person and I offer a challenge question often. If it's not responded to adequately, I simply ignore them. But in the end there isn't much you can do. If you put it on Facebook -- consider it public, like if it was in the phone book.
The dangers of knowledge trigger emotional distress in human beings.
Assuming what he did produces a valuable result.
If it's defensible in court by an entity with enough cash or lawyer might, why is there no such entity doing the same thing and then fighting facebook in court?
If it isn't defensible in court, why does it matter that he didn't fight because he didn't have the money?
this is what the guy should do:
1. engage the lawsuit
the downside is financial exposure. so incorporate your work in such a way that it can't hit your personal finances. the upside is massive exposure. you will achieve some level of fame: the guy who finally gave the robots.txt convention a legal status quo. this will help you professionally, as well as make your life story
2. whine to google
you are completely right that google shouldn't have to get permission every time it wants to crawl the site. therefore GET GOOGLE TO DEFEND YOU
intellectual property law is philosophically incoherent. it is your moral duty to ignore it or sabotage it
I am not anything even approaching a lawyer, but I suspect his actions were probably legal. The Internet is a public medium, unless you specifically put walls around content, it has the same protection as if you posted fliers on a physical bulletin board in a public place. Yes, you retain copyright over your content, but you have ZERO ability to say "by reading this, you agree to additional terms". If I want to produce a review of all the fliers posted around town, I can. If I want to make excerpts (within "Fair Use") I can. Pretty much the only thing I can't legally do is deface them or copy them outright. Unless he was doing this from a logged in account, I can see how they can limit what sorts of derivative works he makes. (So long as the derivative doesn't violate copyright)
Pre-Facebook, Zuckerberg created a site that let Harvard students compare each other, a bit like Hot or Not. Obviously nobody was going to go to a site that wasn't populated with their classmates, so he basically crawled the websites of the various residential houses that put their students info online (but behind passwords and auth) and copied it into his own site.
He actually got into a fair bit of trouble for this, and ended up being sent to Harvard's ad-board for discipline (I think he got put on probation, but I'm not entirely sure).
The key difference here is that this guy actually did everything by the book and followed robots.txt, whereas Mark Zuckerberg didn't.
I'd also like to point out in their terms:
So, Mark, you say Facebook have a reasonable expectation for privacy of its data? Isn't privacy passe now? Or did I hear you wrong?
Threats of legal action are not a lawsuit. He didn't get sued. He got bluffed. I don't blame him for caving in, but he shouldn't mislead people by referring to the receipt of threats from lawyers as being sued (this is the sort of error I expect from the Slashdot editors, of course).
Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
robots.txt isn't legal document, it's an accepted industry standard way for web sites to limit what web spiders and other web robots can search for. Facebook's robot.txt file basically welcomes everyone to come on in and search their site. Complaining that someone used the data that you gave them permission to access is like realtors complaining that someone is visiting open houses they sponsor and then publishing an analysis of houses for sale based on data gathered during those visits. If Facebook doesn't like that others can aggregate data on their site they should get the industry to agree to a new standard tag that permits crawling but forbids aggregation.
imagine - you put a robots.txt in your root directory, allow crawlers to crawl everything, and then sue those who crawled your stuff.
facebook is not even an established, long standing part of the big capital elite, they are startups, who are from the new generations and from the new tech age.
but see, when they became big capital, they are similarly trying to stomp down others by their copy'right' and big money, despite they come out from our own lot in the recent decade.
this shows, regardless of generation, or culture, having copyrights and big capital eventually cause intellectual feudalism favoring the rich elite, EVEN if they are in the wrong.
Read radical news here
One reform that would help - not completely, because deep pockets would still win out a lot - is the idea of "loser pays." One of the few decent legal ideas out of Europe, and helps prevent frivolous suits...
The Supreme Court has held for a very long time now that the right to free speech means the right to anonymous speech, especially political speech.
Yes.. for people. But not necessarily for organizations.
Of course, making such a distinction will require reversing a very old (and recently reinforced) precedent in US law, where organizations have personhood. Probably requiring an amendment. So it won't happen.
It's important to realise that media / arts can be copyrighted, as they are ostensibly physical products (although that tends to include digital media these days) that have been created by someone, so your MP3 can be copyrighted and the rights holder protected.
Your name, address and phone number are NOT copyrightable, because they are not considered artforms with a physical manifestation, they are merely facts.
I am a human male, 42 years old, living in Philippines. These facts can NOT be copyrighted.
Now apply this rule to Facebook. Sure, any photos and videos you post CAN be copyrighted, as they are physical artforms. Your personal bio cannot.
And as TFA was about scraping biodata, i.e. non-copyrightable data, the guy had a perfectly valid case, but was scared off by the big boys brandishing loaded lawyers.
And you still need to get a grip.
I've read through the visible comments, and all of them seem to miss the point: the legal system has just operated in reverse. Rather than preventing the stronger entity from stealing from the weaker, it was actually the means by which the stronger DID the stealing.
Here, so far as I can tell, is what happened: The guy pulled a bunch of PUBLICLY AVAILABLE data from Facebook, connected it in new and interesting ways, offered to sell the product of his hard work to other entities, and then had to delete it all because Facebook got antsy and sued him, and he didn't have the money to defend himself. And of course, Facebook will now take the same ideas, and build up and sell their own datasets.
This is akin to bullies using school rules to steal homework from nerds and turn it in under their own names.