Facebook Crawler Speaks Back
Last week we ran a story about Facebook suing to get a crawled dataset offline. This week we have a bit of a
response written by Pete Warden, the guy who actually did the crawling. He followed robots.txt, and then Facebook's lawyers went after him. It's actually a quite interesting little tale and worth your time.
Did this guy really think he could just give away the data that Facebook sells (or intends to sell) to third parties and NOT have them sue him for it? It's no secret that the business model of most of the social sites and big search engines factor in the massive amounts of data they collect on users as a major corporate asset, to be used internally for data mining and also sold (supposedly after being anonymized) to advertisers and other third parties. It takes a babe in the woods to think he can just waltz in and take that away with a "But your robot.txt didn't say I *couldn't* do it" defense, without expecting a big legal fight.
Is the guy in the right? Probably. Would he have a case? Probably. Does either of those facts matter if he doesn't have the big $ needed to hire lawyers and fight through several courts? Nope.
SJW: Someone who has run out of real oppression, and has to fake it.
...you are supposed to scan scumbags.txt, not robots.txt.
Stupid, but ballsy. Gotta give credit where it's due.
Living With a Nerd
Mark Zuckerberg is the most unethical guy in the industry today. As is obvious by the origins of Facebook, his infamous hacking of the journalists passwords during the the-facebook era and countless other fiascoes that come to news from time to time. Everyone who has ever dealt with him says have bad things to say about him.
If he is the face of the next generation entrepreneurs, then god saves the industry.
The guy's work looks somewhat interesting. I don't see why he can't just make it a facebook app or something that just happens to crossover onto the rest of the internet as well, maybe that would have helped him fly under their radar if it was seen as something that enhanced facebook.
But seems like his problem all along was lack of publicity, which /. will surely help with.
That said, call me old-school, but I've had more fun with things like ircstats. So I'm mostly still waiting for this new social crap to catch up.
I might be alone here but spiders revolt me to a point where I simply respect them and leave them alone.
But that said, Google operates a spider, pretty much. So we have to look at any potential spider on the internet like we might look at Google. If he followed the Robots.txt as Facebook set it up and he didn't try to misunderstand it, then there isn't anything they can do. Although, I'm pretty sure the Facebook EULA says you can't spider them so he's SOL anyway if that's the case. This should be a long and drawn out case unless there is a settlement.
Facebook is ripe. People put up EVERYTHING about themselves on there. I never accept a friend request unless I know the person and I offer a challenge question often. If it's not responded to adequately, I simply ignore them. But in the end there isn't much you can do. If you put it on Facebook -- consider it public, like if it was in the phone book.
The dangers of knowledge trigger emotional distress in human beings.
Assuming what he did produces a valuable result.
If it's defensible in court by an entity with enough cash or lawyer might, why is there no such entity doing the same thing and then fighting facebook in court?
If it isn't defensible in court, why does it matter that he didn't fight because he didn't have the money?
this is what the guy should do:
1. engage the lawsuit
the downside is financial exposure. so incorporate your work in such a way that it can't hit your personal finances. the upside is massive exposure. you will achieve some level of fame: the guy who finally gave the robots.txt convention a legal status quo. this will help you professionally, as well as make your life story
2. whine to google
you are completely right that google shouldn't have to get permission every time it wants to crawl the site. therefore GET GOOGLE TO DEFEND YOU
intellectual property law is philosophically incoherent. it is your moral duty to ignore it or sabotage it
I'm pretty sure robots.txt doesn't count as a legal document.
and it's not the fact that he downloaded it all, it's the fact that he is distributing it.
Do I really need to say anything else at this point?
In Soviet Russia jokes are formulaic and decidedly non-humorous.
FTFA:
All Facebook is doing is nailing has "anal".
I am not anything even approaching a lawyer, but I suspect his actions were probably legal. The Internet is a public medium, unless you specifically put walls around content, it has the same protection as if you posted fliers on a physical bulletin board in a public place. Yes, you retain copyright over your content, but you have ZERO ability to say "by reading this, you agree to additional terms". If I want to produce a review of all the fliers posted around town, I can. If I want to make excerpts (within "Fair Use") I can. Pretty much the only thing I can't legally do is deface them or copy them outright. Unless he was doing this from a logged in account, I can see how they can limit what sorts of derivative works he makes. (So long as the derivative doesn't violate copyright)
Pre-Facebook, Zuckerberg created a site that let Harvard students compare each other, a bit like Hot or Not. Obviously nobody was going to go to a site that wasn't populated with their classmates, so he basically crawled the websites of the various residential houses that put their students info online (but behind passwords and auth) and copied it into his own site.
He actually got into a fair bit of trouble for this, and ended up being sent to Harvard's ad-board for discipline (I think he got put on probation, but I'm not entirely sure).
The key difference here is that this guy actually did everything by the book and followed robots.txt, whereas Mark Zuckerberg didn't.
What is stopping someone from crawling for the same data, and posting it anonymously on something like Wikileaks or the like?
The information that could be gleaned from this dataset is immense, and the one set of data could be analyzed in different ways for years to come.
That would be kombucha.
80% o the people on it, and the people who run it. laugh zuckerberg.... you evil queen, you will become IRRELEVANT
I will persist.
I'd also like to point out in their terms:
mod +6
intellectual property law is philosophically incoherent. it is your moral duty to ignore it or sabotage it
So, Mark, you say Facebook have a reasonable expectation for privacy of its data? Isn't privacy passe now? Or did I hear you wrong?
> LGBT used the toilet war [transboutique.com]
You understood wrong, someone said "Yeah we boys _got screws_ around".
This is the wrong forum.
Threats of legal action are not a lawsuit. He didn't get sued. He got bluffed. I don't blame him for caving in, but he shouldn't mislead people by referring to the receipt of threats from lawyers as being sued (this is the sort of error I expect from the Slashdot editors, of course).
Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
Privacy is a misguided concept introduced at the same time as when cavemen tried to hide behind trees whilst relieving themselves. It is public data ergo it is public data. Case solved.
will the real EFF please standup
because, they put a robots.txt file in their root folder which allowed him to crawl everything.
its facebook's fault.
Read radical news here
imagine - you put a robots.txt in your root directory, allow crawlers to crawl everything, and then sue those who crawled your stuff.
facebook is not even an established, long standing part of the big capital elite, they are startups, who are from the new generations and from the new tech age.
but see, when they became big capital, they are similarly trying to stomp down others by their copy'right' and big money, despite they come out from our own lot in the recent decade.
this shows, regardless of generation, or culture, having copyrights and big capital eventually cause intellectual feudalism favoring the rich elite, EVEN if they are in the wrong.
Read radical news here
One reform that would help - not completely, because deep pockets would still win out a lot - is the idea of "loser pays." One of the few decent legal ideas out of Europe, and helps prevent frivolous suits...
You're all missing the point, this guy has no ethics... "I used robots.txt for info", care to think about that statement.
I'll be counting the time until some other programmer in a country Facebook can't touch (or who does it anonymously), scrapes the exact same data and pushes it up on a torrent, purely because of this story.
Although I'm not sure why Facebook are so concerned, the friend data is relatively fluid, whilst there will be long term friendships, people add new friends all the time (and I'd guess, to a lesser extent, remove people who are no longer friends.
Plus the data isn't THAT reliable, I have several friends who, purely as a game have 1000+ friends and counting, purely to see how many they can get.
without prior written permission. Lose the fight. Then go crawl facebook.
That's if you believe any of that crap, anyway.
OT: Has the FSM made claim to any name (or Name) as far as anyone knows?
New mod option wanted: -1 DrunkenRambling
its degeneration into low iq racism and ignorance is sad, but it is what it is
rusty should close that ghetto down
hell, i'll offer rusty $$$ someday to buy the site from him, just so i can have the pleasure of personally pulling the plug on that pool of filth
the site once had its charms, but its worthy of nothing but condemnation now. the (l)users who still continue to haunt that place are truly degenerates: /b/tards, without the wit, if you can imagine such a thing
intellectual property law is philosophically incoherent. it is your moral duty to ignore it or sabotage it
Why isn't this covered by copyright? Just because I put my information on the web, doesn't mean you can reproduce it.
The Supreme Court has held for a very long time now that the right to free speech means the right to anonymous speech, especially political speech.
Yes.. for people. But not necessarily for organizations.
Of course, making such a distinction will require reversing a very old (and recently reinforced) precedent in US law, where organizations have personhood. Probably requiring an amendment. So it won't happen.
Well, data is mine if I collect it. Otherwise it is not mine. In this case the auhor clearly independently collected the data (w/o re-publishing it verbatim).
Copyright does apply to creative works, data collection with a computer program (the author) does hardly qualify. For that matter allowing people to publish data with a computer program (the popular website) also is hardly creative.
On top of that high US court (don't remember if it was the Supreme Court) has recently confirmed that simple lists of data (like a phone book) can't be protected by copyright.
I'd even argue that robots.txt is a contract, simply by its pervasive use and clearly understood rules around it. Under the law Facebook can hardly tell everybody "go and spider my pages" and then say to individual parties that it did not mean it. If Facebook wanted more control, it could as well close it's robots.txt and invite anybody to open negotiations with them to spider their site under a different (individual) contract.
I've read through the visible comments, and all of them seem to miss the point: the legal system has just operated in reverse. Rather than preventing the stronger entity from stealing from the weaker, it was actually the means by which the stronger DID the stealing.
Here, so far as I can tell, is what happened: The guy pulled a bunch of PUBLICLY AVAILABLE data from Facebook, connected it in new and interesting ways, offered to sell the product of his hard work to other entities, and then had to delete it all because Facebook got antsy and sued him, and he didn't have the money to defend himself. And of course, Facebook will now take the same ideas, and build up and sell their own datasets.
This is akin to bullies using school rules to steal homework from nerds and turn it in under their own names.
Registering a corporation costs what, $100 these days? Even if you could pass such a law, if General Evil-Overlord Services, Inc., wants to spend money in secret, they can have their subsidiary "Concerned Citizens For A Better Tomorrow, Inc." buy the ads, with an arbitrary mesh of obfuscatory shell companies in between just in case pesky investigative reporters or financial auditors want to follow the money, and if necessary throw in some (more expensive) policy research foundations or other cutouts if they're particularly disreputable or have a lot of money to hide. It's not like they don't do this today to work around current ad-labeling laws.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
One of the important points made in the Citizens United case was that it's hard to write regulations that can distinguish between for-profit media companies like the NYT/WSJ/FoxNoise and other corporations. Personally, I don't see how the court could have let a law stand that blocks Whiny Republicans United from making a "Hillary Is Evil" movie without also blocking Michael Moore from making a "Bush is Evil" movie (artistic merit's really not the law's territory :-)
The Sierra Club's a different case - aren't they a 501c3 non-profit, which get to be tax-deductable in return for limitations on their activities? Or if they're not, there are certainly issue advocacy groups that are.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks