Slashdot Mirror


Facebook Kills Dataset of Crawled Public Profiles

holy_calamity writes "Internet entrepreneur Pete Warden wrote a crawler that collated the public profiles of 210 million Facebook profiles and was set to release an anonymised version to researchers. The pages crawled can be read by any web user, and the robots.txt did not forbid crawling. However, Facebook claimed he had violated its terms of service and threatened legal action. Fearing costs, Warden has now destroyed his dataset. For a snapshot of the insights that data could have allowed, see Warden's post on how the friend networks of the 120 million US users in his data segregated into seven clusters." Of course, if he had it, this means anyone who wants it made their own version of this.

158 comments

  1. For an Interesting Exercise in Head Asplosion by eldavojohn · · Score: 4, Interesting

    Fearing costs, Warden has now destroyed his dataset.

    Couldn't Warden have sent requests to the EFF to provide lawyers so he could fight an evil corporation to use freely publicly available information?

    Then Facebook could ask the EFF to protect their user's privacy and information being sold to marketers and corporations (sorry, when you're introduced as "Internet entrepreneur" that means there's profit to be had).

    --
    My work here is dung.
    1. Re:For an Interesting Exercise in Head Asplosion by paeanblack · · Score: 4, Insightful

      Couldn't Warden have sent requests to the EFF to provide lawyers so he could fight an evil corporation to use freely publicly available information?

      Finding something on the web does not give you the legal authority to publish and redistribute it. Sure, he could have stuck the whole thing on a torrent somewhere, but if he actually wants to do real work and real research with these data, he's got to play by the rules of the real world...the one with the big blue ceiling and a concept called the rule of law.

      If you don't like that reality, keep it in mind next time you vote.

    2. Re:For an Interesting Exercise in Head Asplosion by truthsearch · · Score: 1, Redundant

      Except Facebook is claiming he violated its terms of service (a contract), not the law.

    3. Re:For an Interesting Exercise in Head Asplosion by Tobor+the+Eighth+Man · · Score: 3, Informative

      Not really a meaningful distinction, as contract law is very much an aspect of the law. We can bicker about whether terms of service are enforceable and to what extent, but the reality is that this guy has better things to do than wage a complex and almost certainly protracted legal battle against a corporation.

    4. Re:For an Interesting Exercise in Head Asplosion by Registered+Coward+v2 · · Score: 2, Insightful

      Couldn't Warden have sent requests to the EFF to provide lawyers so he could fight an evil corporation to use freely publicly available information?

      Finding something on the web does not give you the legal authority to publish and redistribute it. Sure, he could have stuck the whole thing on a torrent somewhere, but if he actually wants to do real work and real research with these data, he's got to play by the rules of the real world...the one with the big blue ceiling and a concept called the rule of law.

      If you don't like that reality, keep it in mind next time you vote.

      I'm not sure what he did was not legal; but the article is pretty clear he doesn't have the resources to fight it in a court and so decided to destroy it. Maybe someone with more money and time may someday decide to fight it and the legality of scrapping information will be clarified by a court.

      To me, the real question is how do TOS square with robot files? Given the generally accepted and followed practice of their use; does not forbidding crawling implicitly allow the data to be collected and used as the scrapper sees fit?

      If you view the data as facts; then they are not copyrightable and so aggregating them would be permissible; assuming the TOS is not binding if a scrapper follows the robots.txt instructions. If that is the case, I'd guess a lot more robots.txt files would prohibit scrapping.

      At any rate, I'd say the real world rules are not real clear here, other than the one that says "avoid picking a legal fight with someone who has a ton more money and lawyers than you."

      Personally, I'd be surprised if someone else already has the same data; but rather than publicize it the simply are using it however they see fit.

      --
      I'm a consultant - I convert gibberish into cash-flow.
    5. Re:For an Interesting Exercise in Head Asplosion by Anonymous Coward · · Score: 1, Insightful

      Or we could do what America did. Violent revolution and genocidal extermination of the existing inhabitants of the lands we wish to own. That works better than voting and is a very, very American thing to do.
       
      Rule of law? You are fucking joking right?

    6. Re:For an Interesting Exercise in Head Asplosion by tibit · · Score: 1

      How does the sticking power of TOS test out in court? Do facebook's TOS actually mean anything, if all you need to do to access their site is to type in a URL? I mean there isn't even a clickthrough to have them pretend like they care. Yes, I seriously would like to know.

      --
      A successful API design takes a mixture of software design and pedagogy.
    7. Re:For an Interesting Exercise in Head Asplosion by geekoid · · Score: 4, Insightful

      Yes, but you can collect data and publish it as such. Scientific data, not data in the computer sense.
      He should of kept his mouth shut, compiled the data , and then just submitted it to a number of journal. At that point Facebook needs to go after the journals. Facebook would have a tough time winning. and even if they did when, going after the journals would be bad PR. SO no real win there. There bet bet would be to actually help him after the fact and look at the data to ensure that an "individuals privacy has not been violated"

      The data on social networking sites is amazing and could teach us a lot about human nature.

      --
      The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
    8. Re:For an Interesting Exercise in Head Asplosion by Anonymous Coward · · Score: 0

      What evidence do you have that the guy even agreed to the terms of service? If you put something online you give people permission to view it, and if you don't make them log in and click "I agree" then they're not bound by your terms of service. If you don't block access to it via robots.txt you give bots permission to view it, and the same applies. I'm not saying that you give up all rights to anything you put online; copyright law, trademark law, etc still applies. But what this guy was doing violated neither of those.

      By saying he violated their terms of service they are saying that by crawling their site he implicitly agreed to their terms of service. That's like me putting a notice at the end of this comment saying "by reading this comment you agree to my terms of service"--legally preposterous. It's an issue of deep legal pockets vs. a small research team, and Facebook wanting to restrict free versions of their demographic information--because they want to be able to charge money.

    9. Re:For an Interesting Exercise in Head Asplosion by K.+S.+Kyosuke · · Score: 2, Funny

      Except Facebook is claiming he violated its terms of service (a contract), not the law.

      To me, this claim seems to be as legitimate as a public library claiming that I read too many books and threatening to sue me.

      --
      Ezekiel 23:20
    10. Re:For an Interesting Exercise in Head Asplosion by Anonymous Coward · · Score: 1, Insightful

      Finding something on the web does not give you the legal authority to publish and redistribute it.

      Why not? Copyright?

      Copyright law (at least in the US) does not cover data.

      Which is probably why Facebook said it was a "contract" violation.

    11. Re:For an Interesting Exercise in Head Asplosion by dubbreak · · Score: 4, Insightful

      Not really a meaningful distinction, as contract law is very much an aspect of the law.

      If he was using an account I could see there being a contract enforceable (e.g. if you except these terms of service we will give you an account). If he was just crawling publicly viewable facebook pages, then what is the consideration? I'd argue there is none and therefor no contract exists. You aren't forced to login to view many pages and it's not like they even have a click through "I agree" TOS on each publicly viewable page. He broke no laws and there is no enforceable contract.

      If facebook doesn't want people crawling pages publicly viewable pages then make them private (loging in required) or at least have a robots.txt that prohibits crawling of those pages.

      --
      "If you are going through hell, keep going." - Winston Churchill
    12. Re:For an Interesting Exercise in Head Asplosion by Rantastic · · Score: 2, Informative

      Finding something on the web does not give you the legal authority to publish and redistribute it.

      Nonsense.

      Allow me to call your attention to Fair use, a doctrine in United States copyright law that allows limited use of copyrighted material without requiring permission from the rights holders, such as for commentary, criticism, news reporting, research, teaching or scholarship.

      Of course, none of that is actually relevant as Facebook is not making a copyright claim. They are claiming he violated their terms of use. I just scanned it and the only seemingly relevant text I can find is

      If you collect information from users, you will: obtain their consent, make it clear you (and not Facebook) are the one collecting their information, and post a privacy policy explaining what information you collect and how you will use it.

      --
      Ask Slashdot: Where bad ideas meet poor googling skills.
    13. Re:For an Interesting Exercise in Head Asplosion by Anonymous Coward · · Score: 0

      I fail to see how any politician can change the color or location of the sky simply by using rhetoric and a pen.

    14. Re:For an Interesting Exercise in Head Asplosion by wprowe · · Score: 1

      Technically, any text that an individual writes on Facebook is copyrighted as their own creative work, like it or not. Redistributing it in whole would violate that individual's copyrights on their own text. Anonymous, aggregated statistics probably would not be governed by that. Whether the Facebook TOS could be imposed on anonymous crawling of the site is the real legal question one would have to answer, I guess. Is crawling the site and copying the data comparable to viewing the site? Facebook might argue that it is, and then might argue therefore that their TOS are enforceable.

    15. Re:For an Interesting Exercise in Head Asplosion by davidwr · · Score: 1

      Practical answer: Next time, do your research overseas.

      Commentary: It's sad when you have to do legal forum shopping before starting your research.

      --
      Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
    16. Re:For an Interesting Exercise in Head Asplosion by The+Moof · · Score: 2, Interesting

      but if he actually wants to do real work and real research with these data, he's got to play by the rules of the real world...

      The summary says the crawler simply indexed public information. Why is this relevant? Well, recently, I noticed that Facebook Apps, all of which I have all disabled and blocked via my privacy settings, have started accessing my information again. Naturally, I assumed something got reset and started hunting for the settings again. Until I found this new block of text in all of their privacy settings:

      When you visit a Facebook-enhanced application or website, it may access any information you have made visible to Everyone Edit Profile Privacy as well as your publicly available information. This includes your Name, Profile Picture, Gender, Current City, Networks, Friend List, and Pages. The application will request your permission to access any additional information it needs.

      So they claim they can't stop people from acquiring and using my 'publicly available' information, because it's open to the public. Then, they turn around and go after this guy for indexing and using the same 'publicly available' information.

      It all sounds a little two-faced to me.

    17. Re:For an Interesting Exercise in Head Asplosion by Anonymous Coward · · Score: 0

      But you don't give them the right to COPY it. He has the right to view it sure. But to make copies? Nope. The stuff is copyrighted. So what he did was not legal.

    18. Re:For an Interesting Exercise in Head Asplosion by Svartalf · · Score: 1

      robots.txt requires that a crawling app HONOR said file.

      --
      I am not merely a "consumer" or a "taxpayer". I am a Citizen of the State of Texas
    19. Re:For an Interesting Exercise in Head Asplosion by crashumbc · · Score: 2, Informative

      unless something has changed, you have to "login" to see anything in Facebook. Even if a page is "public" you can't view it without logging in with your own account.

      A crawler may or may not by pass that...

    20. Re:For an Interesting Exercise in Head Asplosion by Anonymous Coward · · Score: 0

      Does facebook actually have a copyright on it's entire dataset or do they just say or yeah, see that copyright symbol, you can't use our data? or do they simply say we own everything on our servers and noone else can use any of the information in any way we don't like?

      The first case is just funny. It reminds me of when people would write copyrights at the bottom of their personal webpages and think that they had somehow ownership of the information.

      The second case presents a somewhat credible argument. Except it doesn't personally make any sense to me.

      In either case, I've always been of the opinion that the internet is like an extension of the old bulletin board systems, which were an electronic extension of actual bulletin boards, which worked under the premise that anything on them was made available for public use. If you put up something on that bbs then its your responsibility to control who you want access to the data. If someone is intelligent/devious enough to break your control then it is still the posters fault for putting something they didn't want public on a public system.

      I think the reality of controlling the internet that companies and governments are attempting to do will only be acchieved once they realize that the internet and anything on it is public domain. They can then start to develop proper controls for the data they want to remain hidden from public use.

    21. Re:For an Interesting Exercise in Head Asplosion by Anonymous Coward · · Score: 1, Insightful

      But if robots.txt disallowed crawling, then Facebook would be able to show that their intent was to not allow this type of data access.

    22. Re:For an Interesting Exercise in Head Asplosion by tibman · · Score: 1

      robots.txt just gives you advice, nobody is required to follow it. http://en.wikipedia.org/wiki/Robots.txt#Disadvantages

      That is my understanding of the thing anyways.. maybe when it becomes a real standard they can do more with it?

      --
      http://soylentnews.org/~tibman
    23. Re:For an Interesting Exercise in Head Asplosion by blahplusplus · · Score: 1

      "he's got to play by the rules of the real world...the one with the big blue ceiling and a concept called the rule of law."

      Which are bought and sold by lobbyists. The law is such a joke because it always kowtow's in some way or another to private interests.

    24. Re:For an Interesting Exercise in Head Asplosion by gorzek · · Score: 1

      Copyright is not absolute. Phone books, for instance, are not copyrighted because they are collections of facts--namely, addresses and phone numbers.

      Likewise, he could copy all sorts of factual information about the users on Facebook: their names, contact information, friends, etc. He could likely not get away with copying their photos, status updates, and so forth since those can constitute creative works and are thus copyrighted.

      Nevertheless, just because something is online doesn't mean it's automatically copyrighted. Facts themselves are not.

      That's most likely why Facebook went after him using the TOS claim rather than a copyright infringement claim.

    25. Re:For an Interesting Exercise in Head Asplosion by elnyka · · Score: 1

      How does the sticking power of TOS test out in court? Do facebook's TOS actually mean anything, if all you need to do to access their site is to type in a URL? I mean there isn't even a clickthrough to have them pretend like they care. Yes, I seriously would like to know.

      Those are excellent questions that need to be resolved either amicably or in a court of law (which is what was going to happen.) The later is expensive. Unless you have some powerful backer$s you can't do it alone (though it begs the question why he didn't contact the EFF in the first place.)

    26. Re:For an Interesting Exercise in Head Asplosion by Anonymous Coward · · Score: 0

      What is Google doing then? They cache most pages they "find on the web".

    27. Re:For an Interesting Exercise in Head Asplosion by CAIMLAS · · Score: 1

      Finding something on the web does not give you the legal authority to publish and redistribute it

      At the same time, if he never agreed to the EULA (and they did not require him to do so in order to read the content) then he's probably over-reacting in deleting the data. What laws might he be breaking, here? I'm not aware of any - though he was certainly setting himself up for wanton litigation on account of the bad publicity.

      This isn't wanton publishing of said data. It's a 'derivative work'. Think: someone canvasing an area for who has which kinds of grass is seeded in peoples' yards (and how well its growing) and selling that information.

      --
      ~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
    28. Re:For an Interesting Exercise in Head Asplosion by mwvdlee · · Score: 1

      Assuming all those profiles were indeed publicly available without having to log in to facebook, how could he have ever violated terms of service if he never agreed to any terms of service?

      Am I to assume that anybody that has the misfortune to view a facebook profile without being a facebook member is automagically in violation of facebook's terms of service?

      --
      Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
    29. Re:For an Interesting Exercise in Head Asplosion by Anonymous Coward · · Score: 0

      He could have done that, yes. But there's no guarantee that they would be able to help and even if they could it would be an amazingly huge pain in the ass from any point of view.

      Once again our bullshit legal system allows companies to bully innocent citizens into submission. The solution: Bomb facebook headquarters.

    30. Re:For an Interesting Exercise in Head Asplosion by Hurricane78 · · Score: 0

      Yes it does give you the authority! Do you know nothing about how servers work?

      If you look at a page on the web, you send a message to the server, asking “could you please give me that page there?”
      And the server then can decide under what conditions it honors your request.
      These rules are decided by the site hoster upon installation.
      If the server gives you the page freely, and without any conditions (which nearly all web servers do), then you can do with it whatever you want.

      Or in short: You passed it on. You split control. If you wanted something or some rules, you should have demanded them. Now it’s too late, so quit bitching! Maybe you learn something from it for next time.

      That the laws of some fucked-up pseudo-government have nothing to do with, does not change a thing of those basic physics-based rules of reality.

      --
      Any sufficiently advanced intelligence is indistinguishable from stupidity.
    31. Re:For an Interesting Exercise in Head Asplosion by Anonymous Coward · · Score: 0

      EULAs and whatnot have been verified by US courts, and you'd better believe Facebook lawyers know exactly which judges agreed and which district to file with.

    32. Re:For an Interesting Exercise in Head Asplosion by Anonymous Coward · · Score: 0

      voting is a waste of time, get elected for office or hire lobbyists if you really want to see political change

    33. Re:For an Interesting Exercise in Head Asplosion by roseblood · · Score: 1

      Since this is publicly available data and all the guy did was send an automated web browner to go download it does this mean Facebook has threatened him with a lawsuit for doing what every visitor to facebook does already? Granted, he likely did it much faster than any other individual has done. It's just wonky.

      --
      There are lies, damned lies, and statistics.
    34. Re:For an Interesting Exercise in Head Asplosion by clone53421 · · Score: 2, Informative

      They are claiming he violated their terms of use. I just scanned it and the only seemingly relevant text I can find is

      Here.

      --
      Alexander Peter Kristopeit bought his basement from his mommy for one dollar.
    35. Re:For an Interesting Exercise in Head Asplosion by AtlantaSteve · · Score: 1

      Couldn't Warden have sent requests to the EFF to provide lawyers so he could fight an evil corporation to use freely publicly available information?

      About once a month or so, somebody comments about a landlord doing something abusive to a tenant, or a school district violating its student's rights, or citizens being wronged by the police or Feds. Somebody else always responds, "Why don't they just get the ACLU to step in?"

      True, the ACLU has over a half-million members. Between it's political advocacy and charitable foundation arms, it has over $120 million per year of revenue. Yet STILL, it can only be bothered to get involved with a tiny subset of possible cases. It is ultimately a political and public policy organization, and has to pick and choose the battles that will give it the biggest strategic impact with the smallest possible commitment of resources.

      Meanwhile, the entire EFF staff could comfortably hang out at my house and watch a football game. Their annual revenue of $3.4 million is less than the average cost of ONE patent infringement lawsuit... and moreover, they're running a $400k annual deficit over there right now.

      There's a reason why the EFF isn't lead counsel on any multi-million dollar pro bono lawsuits, and it's not because they "didn't get the memo". Rather, it's because that's like asking your local Boy Scout troop to fly down and singlehandedly fix Haiti. It bugs me when people assume that lawyers magically grow on trees to fight your political crusades for free. There is a very low ratio of real support for things like the EFF... compared to Slashdot babble about software patents, stealing MP3's, cracking video game DRM, and all of the other things people do to make-believe that they are "heroes of the revolution" or some such laughable bullshit. More people should put at least a few bucks where their mouths are and lend some support.

    36. Re:For an Interesting Exercise in Head Asplosion by Anonymous Coward · · Score: 0

      I'm guessing he created an account at some point and by that agreed to the TOS. I still don't see it as a contract. They are free to cancel his account for violating the terms, of course.

    37. Re:For an Interesting Exercise in Head Asplosion by Anonymous Coward · · Score: 0

      From the summary it sounds like the data was scraped 210 million pages.
      What journal do you suggest? Raw HTML Monthly?

    38. Re:For an Interesting Exercise in Head Asplosion by tsm_sf · · Score: 1

      It's not two-faced at all. One group is providing Facebook with some form of compensation, and the other is not.

      Since money is more important to Americans than a crying eagle with 'Liberty' down one wing and 'Freedom' down the other, this shouldn't come as a gigantic shock.

      --
      Literalism isn't a form of humor, it's you being irritating.
    39. Re:For an Interesting Exercise in Head Asplosion by gknoy · · Score: 1

      robots.txt requires that a crawling app HONOR said file.

      I believe you mean "suggests" or "requests", rather than "requires". Well behaving robots do obey robots.txt, but even so, if a URL is accessible, it should be either secured or considered publically available.

      Unfortunately, I believe some courts likely have considered modifying URLS to be "hacking", in that it's "unauthorized access" -- simply because the server owners *thought* it was inaccessible, rather than actually protected. I hope such lunacy doesn't last.

    40. Re:For an Interesting Exercise in Head Asplosion by severoon · · Score: 1

      So Facebook is claiming that their ToS is binding for anyone that can view the publicly available profile information they're posting on the web for all to see? (Basically, anyone with a web connection?)

      I shall now claim everyone in existence is bound by my ToS, then, b/c I too have a web page on the intarpip3s. And you are in violation of it, sir/madam. Pay!

      --
      but have you considered the following argument: shut up.
    41. Re:For an Interesting Exercise in Head Asplosion by NotBornYesterday · · Score: 1
      --
      I prefer rogues to imbeciles because they sometimes take a rest.
    42. Re:For an Interesting Exercise in Head Asplosion by dubbreak · · Score: 1

      I'm guessing he created an account at some point and by that agreed to the TOS.

      I assumed otherwise, but yes that could have been the case (I'm assuming neither of us RTFM).

      I still don't see it as a contract.

      That's the fun of law. If you can successfully argue that there was no consideration (something of value exchanged for something else of value), then yeah there was no contract. I think that is a more difficult argument than assuming he didn't create an account. If I were a researcher I would have just accessed public data for my research.

      They are free to cancel his account for violating the terms, of course.

      Yep, assuming he has one. I question what other damages could be assessed anyhow. What actual damage was done by his research? It's definitely not libelous so what else is left?

      Even if a judge could be convinced punitive damages are in order (i.e. he needs to be punished for his actions.. in this case using publicly viewable data for research and publishing said research) then it should be nominal. It's not as though he was trying to put himself above the law (his actions weren't egregious).

      --
      "If you are going through hell, keep going." - Winston Churchill
    43. Re:For an Interesting Exercise in Head Asplosion by dubbreak · · Score: 1

      The point I was trying to make about robots.txt was't that it would make it illegal (I really doubt it does), it's that there wasn't one to even suggest that the pages shouldn't be crawled. There was nothing to prevent access and nothing to suggest one shouldn't.

      I don't know about you, but if I don't want people looking at my house I'll grow a huge hedge so people have to trespass to see it. If I don't want people to access my webpages I'll make them private so they have to hack or have an account to see them (in which case they are either violating the law or TOS).

      Facebook probably got exactly what they wanted. The guy crumbled under the first legal threat. Anyone can make a legal threat and even have lawyers deliver said threat. It doesn't mean the threat is legitimate.

      --
      "If you are going through hell, keep going." - Winston Churchill
    44. Re:For an Interesting Exercise in Head Asplosion by shentino · · Score: 1

      In practice if you're a corporation fighting a user, you always win unless that user has another corporation backing him up (such as the EFF).

      Having the ability to drag someone out in court is powerful incentive for your opponent to fold.

    45. Re:For an Interesting Exercise in Head Asplosion by shentino · · Score: 1
    46. Re:For an Interesting Exercise in Head Asplosion by shentino · · Score: 1

      A robots.txt file is the internet equivalent of a "No Trespassing" sign.

      Access controls are more like locked gates.

      They both make it illegal to enter without permission, but only the second one actually prevents. it.

    47. Re:For an Interesting Exercise in Head Asplosion by sgbett · · Score: 1

      This is an interesting point, I think it is fair to say that modifying of a URL is an intentional act and not an accident?

      I'm thinking of a scenario whereby a user modifies a URL say changing a userid to get access to another persons information. Of course the site should prevent this, but if it doesn't can it be said that no liability lies with the user doing the URL modifying?

      Not trolling, interested in viewpoints. Sane ones preferably!

      --
      Invaders must die
    48. Re:For an Interesting Exercise in Head Asplosion by AHuxley · · Score: 1

      Move to a part of the world where collecting "factual information about the users on Facebook" is not a problem.
      Create a crawler and let him do his research.
      It seems in the US a whole complex set of protections has grown up around the sort, indexing and selling of data.
      The idea that a set of lower cost computers can now do their 'revenue' stream might be upsetting.
      What will the crawler find? Spies, random law enforcement, fake users, complex astro turfing, long term federal task forces, honey pots?

      --
      Domestic spying is now "Benign Information Gathering"
    49. Re:For an Interesting Exercise in Head Asplosion by AHuxley · · Score: 1

      robots.txt requires that a crawling app HONOR said file.
      Think of it more as a 'do not display to public' flag.

      --
      Domestic spying is now "Benign Information Gathering"
    50. Re:For an Interesting Exercise in Head Asplosion by Anonymous Coward · · Score: 0

      Unless you have disallowed it then chances are the is quite a bit available on your public profile that DOES NOT require a login.

      Just log out of Facebook and do a Google search for 'someones name Facebook' and you'll see a stack of public profiles pop up

    51. Re:For an Interesting Exercise in Head Asplosion by Evil+Grinn · · Score: 1

      What facebook lets you see without being logged in is extremely limited. It's very unlikely he could collect a useful amount of info without an account.

    52. Re:For an Interesting Exercise in Head Asplosion by Evil+Grinn · · Score: 1

      Ok I RTFA and it does say he did it without logging in. In which case, the information he had was pretty limited and I'd be surprised if Facebook had any reason to fear competition with any scheme of their own to sell the data.

    53. Re:For an Interesting Exercise in Head Asplosion by tibman · · Score: 1

      ah ok, gotcha. My mistake. Yes i agree with you, it's too bad the theat of legal action is so scary. I don't blame the guy.

      --
      http://soylentnews.org/~tibman
    54. Re:For an Interesting Exercise in Head Asplosion by Intron · · Score: 1

      But you don't give them the right to COPY it. He has the right to view it sure. But to make copies? Nope. The stuff is copyrighted. So what he did was not legal.

      From Facebook's terms:

      "You own all of the content and information you post on Facebook, and you can control how it is shared through your privacy and application settings."

      So how can FB sue the guy for copyright? They have no standing.

      --
      Intron: the portion of DNA which expresses nothing useful.
  2. If Facebook had done this... by John+Hasler · · Score: 4, Insightful

    ...you'd be flaming them for invading your "privacy".

    --
    Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
    1. Re:If Facebook had done this... by Anonymous Coward · · Score: 1, Insightful

      If Facebook had released this information we would be flaming?

      They did and we still are.

      (yes)

    2. Re:If Facebook had done this... by 2obvious4u · · Score: 5, Interesting

      Isn't this the golden egg of Facebook, I though this is what they were selling. That data is fascinating, it is completely anonymous, yet at the same time very insightful for marketing purposes. I think Facebook is just upset because they plan on selling the same data that Pete was.

    3. Re:If Facebook had done this... by Altus · · Score: 5, Insightful

      why do you think they threatened him? they want to sell this data themselves.

      --

      "In America, first you get the sugar, then you get the power, then you get the women..." -H. Simpson

    4. Re:If Facebook had done this... by moteyalpha · · Score: 1

      It seems that many of these data sets are public and easily accessible to analysis. I would find it interesting to simply use various forums like slashdot and have a ranking of who had the most insightful comments by user name. Certainly the data is available as people make it so. It seems that there is a schizophrenic aspect to this, people want to be recognized for what they represent and when they become too famous they get nervous about it.
      I am sure that much of this data is already available in an organized form in many places like Google analytics.
      I want to know who is the biggest Karma whore, how many times is XKCD linked , and why does that guy named AC have a fascination with goats.
      It is also possible to look at commercial pages and identify which ads are placed and then determine who is spending the most money on ads. It then becomes a tool to see what competition is doing. I suspect that much of this data is already available to a number of organizations by a simple data base query.
      So if China, Russia, NSA, Iran... data bases every bit of this info in its "secret" data bases that respect no bounds, and the public has no access to it, are we being cheated by being too private?

    5. Re:If Facebook had done this... by anglico · · Score: 1

      exactly! I wish I had mod points, oh wait you're already at 5! I was expecting to read a whole list of complaints against this practice when I started reading the comments, and was surprised to say the least.

    6. Re:If Facebook had done this... by NeutronCowboy · · Score: 4, Interesting

      Most likely. Facebook's gold mine isn't even so much the user information itself - it's the networks that they can build out of the relationship data. As of right now, they haven't figured out a way how to make money from it, but they certainly aren't going to let someone take the most valuable aspect of their system - the network information - and put it out in the open.

      Personally, I hope someone does the same work, but uploads the raw data anonymously to a torrent somewhere.

      --
      Those who can, do. Those who can't, sue.
    7. Re:If Facebook had done this... by Late+Adopter · · Score: 1

      Except Pete can't actually sell the data, that would be a derivative work of their copyrighted web-pages. Sure he has the fair-use ability to publish academic studies, but he'd be limited to using the data internally.

    8. Re:If Facebook had done this... by Anonymous Coward · · Score: 0

      I'm a little confused because I have definitely seen the facebook data used in research presented at conferences in the last few years, and had just presumed that it was an official version (granted it was on the level of connections rather than much info, but still). Real world data is very useful in analyzing mathematical models of how that data should behave. I must admit to getting a kick out of Hitler being the most frequent joint appearance in film - so many movies use clips of his speeches.

    9. Re:If Facebook had done this... by Anonymous Coward · · Score: 0

      +5 Insightful? More like +5 Idiot.

      I'm going to explain to you a little something you should already know. In fact, I hope for your own sake that you DO know this and are just being disingenuous, otherwise you really are a moron.

      Ready? Here goes:

      Those of us who are critical of Facebook (and other such services) on privacy grounds generally ASSUME that Facebook already collects such data AND HAS BEEN DOING THIS SINCE DAY ZERO. After all, said data is collected by their service on their servers. Do you think they may be, you know, gathering it? This assumption, moreover, IS PRECISELY WHY we claim that the Facebooks of this world raise serious privacy concerns.

      I'll go even further, in fact, and claim that a big part of the reason it is OK for this guy to do this is that FACEBOOK IS ALREADY DOING IT, and the values of transparency and democracy that most of us embrace would be better served by such data being public.

      So you see, your lame attempt to expose our hypocrisy only results in a display of your own idiocy and moral midgetry.

      The scare quotes around "privacy" just act as the cherry on top. To that part I can only comment: "Christ, what an asshole".

  3. Facebook *did* do this by Chirs · · Score: 5, Insightful

    I see very little problem with an automated scan that respects robots.txt.

    By not blocking automated access to the profiles, facebook is squarely at fault.

    1. Re:Facebook *did* do this by sexconker · · Score: 0, Interesting

      I see very little problem with an automated scan that respects robots.txt.

      By not blocking automated access to the profiles, facebook is squarely at fault.

      I see very little problem with an automated scan that doesn't respect robots.txt. (As long as it's accessing stuff normal people can get to.)

      Anything a machine can do, a meatbag can do, though usually more slowly.
      Most anything a meatbag can do, a bunch of meatbags can do much more quickly.

      Robots.txt says go away? Amazon's Mechanical Turk says Thank You, Come Again.

    2. Re:Facebook *did* do this by Inda · · Score: 1

      Back in the late nineties I wouldn't have thought twice about downloading a whole site. It wasn't unusual. I had a program for doing it, although I believe the popular browser of the day had a feature that saved a good potion.

      My, how things have changed. GOML.

      --
      This post contains benzene, nitrosamines, formaldehyde and hydrogen cyanide.
  4. Yes, by all means, let's stamp out... by jeffb+(2.718) · · Score: 3, Insightful

    ...all the researchers who do everything in the open and with proper anonymization.

    1. Re:Yes, by all means, let's stamp out... by jeffasselin · · Score: 1, Interesting

      You assume such anonymization is actually possible, I somehow doubt it.

      --
      If he explores all forms and substances Straight homeward to their symbol-essences; He shall not die.
    2. Re:Yes, by all means, let's stamp out... by geekoid · · Score: 1

      It is, and it's done all the time in the scientific community.

      I don't see why would would think removing peoples names isn't possible.

      --
      The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
    3. Re:Yes, by all means, let's stamp out... by Anonymous Coward · · Score: 2, Interesting

      Even with names removed, data like this can often be traced back to the person. Your name isn't the only unique thing that appears in your facebook profile.

      As an example, how many others share your permutation of friends and fan pages?

    4. Re:Yes, by all means, let's stamp out... by thePowerOfGrayskull · · Score: 2, Informative

      Removing names isn't necessarily enough. The recent netflix case shows that . I think it's interesting that nobody catches the broader implications of that discussion -namely that whether they're "anonymizing" data for purposes of providing it for research, or selling it for marketing... the ability to reverse engineer patterns to undo it remains a risk. -

    5. Re:Yes, by all means, let's stamp out... by Rob+the+Bold · · Score: 1

      You assume such anonymization is actually possible, I somehow doubt it.

      If it can only be done clandestinely, then you should definately doubt it. On the other hand, if it's done above the covers with the lights on, you could evaluate the anonymity of the data yourself. Of course, it would be too late then, you might want to review the anonymizing scheme before the data is posted. Again, an above board researcher is more likely to submit his scheme for peer review or, perhaps even better, use a known good system. If you're really certain that such a claim is bogus, and you can back it up, you can publish your own results to that effect.

      --
      I am not a crackpot.
    6. Re:Yes, by all means, let's stamp out... by Ed+Peepers · · Score: 1

      You are entirely correct. Social scientists don't need names or addresses (IP or physical). We can figure out who you are with a frighteningly small number of data points. Doubly so for individuals who have a public Facebook profile and therefore probably have public profiles elsewhere on the net. I suspect that this guy, despite his best intentions, did NOT anonymize the data well enough to hide at least 80% of the users.

  5. Publicly available by mdsharpe · · Score: 5, Interesting

    Since this is publicly available information, and all he did was send a program to go grab it (much akin to asking your web browser to download it), does this mean Facebook has essentially threatened him for no more than reading too much of Facebook too quickly? Sounds absurd to me.

    1. Re:Publicly available by CoffeeDog · · Score: 2, Insightful

      Just because something is publicly available doesn't mean just anyone is free to reproduce and distribute it. In Facebook's TOS their users agree to give Facebook rights to distribute the data they provide to them. By your logic it should be legal to photocopy and distribute any book that is available from the public library or record and distribute MP3s of any song that was broadcast on a radio station.

    2. Re:Publicly available by Trepidity · · Score: 1

      You can't copyright facts though, so it's not clear they would own the dataset, depending on how it were created. For example, while Facebook owns the actual literal webpages on facebook.com, it's questionable whether they own the friend graph, which is simply a fact about how people choose to associate themselves.

    3. Re:Publicly available by NeutronCowboy · · Score: 1

      Not really. It means that Facebook needs to have some data publicly available for users to browse, but that it can't let people take that data out of the Facebook realm. In other words, Facebook knows exactly what it is doing, and is acting in both cases in its best interest.

      Now, does that mean that Facebook's approach makes sense, and would stand up in court? I doubt it, but I don't have the cash to test that theory. Which in turn means that the outcome was just as predictable: Facebook makes up random rules and requests, and they stand because most people don't have the resources to challenge the lawyer army of a successful corporation.

      --
      Those who can, do. Those who can't, sue.
    4. Re:Publicly available by mdsharpe · · Score: 1

      I see your points, and I guess it's the redistribution that's the main issue. Facebook clearly sees their users' activities as valuable property.

    5. Re:Publicly available by prostoalex · · Score: 1

      Disclaimer: I work for the company mentioned in the article, not in legal role though.

      Privacy is dynamic and "publicly available information" is not set in stone - user could've chosen to hide specific bits of that information a few minutes later, and there doesn't seem to be any update protocol to remove those bits from the scraped DB.

    6. Re:Publicly available by Anonymous Coward · · Score: 0

      Precisely.

      Facebook owns the copyright to their website. Their layout. Their programming. But your images, your "friendships", your relational data, your Twitter-like "what's on your mind" postings, messages, etc. etc. etc. are yours and yours alone. They do not belong to Facebook, and Facebook's terms and conditions, for the most part, reaffirm that.

      The problem is, Facebook's not playing by their own rules.

      They're essentially saying they own their users.

      I could see if the guy copied Facebook's data collection methods, or ripped off anything else that's under copyright, by all means, take him down. But to say gathering anonymous data that's freely available that can be construed as common knowledge by use of a unique data collection algorithm, organizing it, and selling it to people is in violation of any law is frankly not right. Because by those terms, any stock quoting software can have many, many Fortune-500 companies suing the daylights out of them.

      Crap, I think I just gave them all an idea... let's hope they don't read /.

    7. Re:Publicly available by Anonymous Coward · · Score: 0

      If facebook wants to sell that data, then they must build up some kind of artificial ownership construct to enforce scarcity of that data. Otherwise, whoever else gathers and sells it first wins.

      I don't see why they would care about things like law, public availability, or sense. They have their goal (use this data to make money), and they set out to achieve it (they must in some way come to own the data). It doesn't matter a whit to them how it's done--they simply must own the data by hook or crook.

    8. Re:Publicly available by mdsharpe · · Score: 1

      This is a good point. However, does a similar thing not occur in browsers caching data as one surfs the web?

    9. Re:Publicly available by cdrguru · · Score: 1

      By your logic it should be legal to photocopy and distribute any book that is available from the public library or record and distribute MP3s of any song that was broadcast on a radio station.

      Legal, maybe not. But it happens every day over the entire planet. And there doesn't seem to be any reasonable way to stop it, so it is going to continue forever.

      Redistribution is the key to the new digital un-economy.

    10. Re:Publicly available by Dhalka226 · · Score: 1

      In Facebook's TOS their users agree to give Facebook rights to distribute the data they provide to them.

      If Facebook needs to write into their TOS an implied permission to distribute users' data, it says to me that the owners of such data are the users themselves. That being the case, Facebook wouldn't have any standing to make demands about what is done with that data by third parties; that would be the individual users' problem insofar as any of the data might be subject to copyright at all. (Most of it I assume is not, since it is purely factual.)

    11. Re:Publicly available by Phrogman · · Score: 1

      This is more like going into a public library and writing down a list of all the books they have by title, ISBN, placement on the shelves, publisher etc, and then relating that information to show connections between the books. Its all publicly available information and anyone can walk in and look at it, write it down etc.

      The difference here is that Facebook is providing its services free to the public so that THEY can go grab all this information and turn it into a dataset they can sell to corporations that want access to our private information. Its just that Facebook's developers have been too stupid to prevent someone from being able to do so, and remiss in not updating their robots.txt (not that a robots.txt file has much effective force).

      --
      "The first time I got drunk, I got married. The second time I bought a chimpanzee, after that I stayed sober" Arian Seid
  6. chilling effect by Anonymous Coward · · Score: 5, Insightful

    Don't see Facebook going after Google, even though the data that they posses is ostensibly the same as Warden's. The primary diff that i see is that warden was offering analysis and results for free- not trying to monetize it. Maybe that's what made them mad.

  7. gray-market black-market by h00manist · · Score: 1

    All data that exists, and someone can sell somehow, is for sale somewhere, somehow. That's the law of money, which is rather strong. So forget the right to privacy law, it's not working for a long time now, there is no way to enforce it, just like the law prohibiting drugs, it just doesn't work. I don't know the solution, or if it's good or bad, but that's the situation, like it or not. Wikileaks, for example, is a result of this.

    --
    Build your own energy sources from scratch. http://otherpower.com/
  8. Facebook is evil by trurl7 · · Score: 1

    Besides the obvious (wasting time, too much info being shared with future employers), their privacy and data policies have gotten worse and worse. Once you sign up with them, they own everything you do. Or at least so they believe. From his writing, this researches was quite open and tried to be as forthcoming as possible. If they had concerns over anonymity, I suspect he would have been happy to discuss the exact data-scrubbing procedure to make sure it's on the level. But instead, these turds reach for the lawyers.

    So it's fine for search engines to cache this data. It's fine for marketing firms to use it to pester even more people. But the moment the researchers get in on it - oh noes, gotta stop that shit from happening.

    With any spare time, I'd sit down, recreate the damn dataset and post it to every torrent site in the world. Let's Streisand these jerks!

  9. So by fulldecent · · Score: 1

    (not that it was actually destroyed), but why destroy the dataset? Just post to slashdot, wait for someone to send you a link to chilling effects or eff, then follow up with chilling effects or eff, then release the dataset.

    --

    -- I was raised on the command line, bitch

  10. Very interesting by Bearhouse · · Score: 2, Informative

    I'll let others debate the 'privacy' issues; (personally I think there's nothing wrong with scraping profile information that people have explicitly made 'public')
    Anyways, just check what he did with it; very interesting: (FTA)
    http://petewarden.typepad.com/searchbrowser/2010/02/how-to-split-up-the-us.html
    There must be many, many legit uses this data could be put too...shame it's being killed by NIH syndrome

    1. Re:Very interesting by Bearhouse · · Score: 2, Funny

      ahem, put 'to', of course...

    2. Re:Very interesting by dangitman · · Score: 1

      There must be many, many legit uses this data could be put too...shame it's being killed by NIH syndrome

      By "NIH syndrome," I assume you're referring to "Not Invented Here." I don't really see what that has to do with this case.

      --
      ... and then they built the supercollider.
    3. Re:Very interesting by Bearhouse · · Score: 1

      Correct on NIH.
      Well, if they were smart, Facebook would already be marketing this data, and/or services based on it, to their users and others.
      One could imagine all kinds of apps; "hey, 20% of your friends are in town 'x', why not go there for a weekend"
      The links to business could be huge, too...
      "Hey, here's a hotel you could stay in..."
      If they proposed those kinds of things, instead of asinine games, then maybe I'd be prepared to take them more seriously, (and not have a problem with their using my 'public' data...)

    4. Re:Very interesting by dangitman · · Score: 1

      The thing is, that I just don't understand why you would use "NIH Syndrome" in this context. That is usually used when somebody in Company X says "Hey, why don't we use this awesome technology to make a better product," but is rebuffed by Company X because the technology was invented by company Y.

      In this example, there is no new technology involved, and Facebook already has the data. What is "not being invented here"? Facebook already invented Facebook, how is Facebook using the data they generated inventing something that Facebook didn't already invent?

      Facebook.

      --
      ... and then they built the supercollider.
    5. Re:Very interesting by Anonymous Coward · · Score: 0

      Most people don't understand the issues with public profiles, or understand the privacy issues with the internet in general.

      They think of these sites as something akin to their own backyards. Sometimes they invite some folks over for a bbq, other times they sunbath nude.

      And they assume their neighbors won't be peeping on them and selling the pictures.

      On the internet, it's big giant corporations like Google who are doing the peeping and selling the pictures to the highest bidder.

  11. Facebook does stuff like this a lot by TheSpoom · · Score: 5, Interesting

    They did something similar to FB Purity, a Greasemonkey script that allows users to filter out apps and other stuff they don't want to see in their feed. Facebook argued that they were misusing their "FB" trademark... eventually they let them continue under the name "fluff busting purity", probably due to the PR backlash that shutting them down would bring.

    They've also shut down the Facebook portion of the Web 2.0 Suicide Machine, which runs scripts that allow a user to delete their social profiles as thoroughly as sites will allow. In that case, they argued that the Suicide Machine was violating their "Statement of Rights and Responsibilities"... which isn't even a law! Nonetheless, the Suicide Machine didn't have the financial ability to fight even frivolous claims like that, so they folded that section.

    Facebook apparently believes that its users will continue using the site regardless of the ridiculous access policies that their legal department create and defend. I hope they're wrong.

    --
    It's better to vote for what you want and not get it than to vote for what you don't want and get it.
    - E. Debs
    1. Re:Facebook does stuff like this a lot by Anonymous Coward · · Score: 0

      They're not.

    2. Re:Facebook does stuff like this a lot by Anonymous Coward · · Score: 5, Insightful

      They're not wrong though. People on FB constantly get outraged at new policies, interfaces and features, but I don't know of anyone who has actually left the site. I am just as bad myself; all I've done is remove everything from my profile and just use it as a hub to stay in contact with people all around me, I haven't gone as far as stopping using the site, and I don't think I will. Nor will many people.

    3. Re:Facebook does stuff like this a lot by flabordec · · Score: 2, Insightful

      Facebook apparently believes that its users will continue using the site regardless of the ridiculous access policies that their legal department create and defend. I hope they're wrong.

      I'm afraid the average Facebook user is a teen who is more worried with getting a higher score in whatever Flash game she is currently playing than in FB's access policies for computers.

      --
      "I see undead people" Warcraft III - Necromancer
    4. Re:Facebook does stuff like this a lot by ZekoMal · · Score: 1
      I left the site. Well, I tried to. At first, they told me that I could only "suspend" the account; ie, people could still send me stuff and FB kept ALL of my data. Outraged, I tried to find an alternative.

      Surprise, surprise. After digging through their FAQ I found an obscure part of it that said you could permanently delete. Here's the problem with it. After you agree to permanently delete, it stays up for two weeks. If you log in even once, it undoes the delete option. Furthermore, there is no guarantee anywhere that your data is actually gone.

      I'm never one to scream "sue"...but if I can't confirm that my data is off of their useless website, I am fucking suing.

    5. Re:Facebook does stuff like this a lot by Anonymous Coward · · Score: 0

      I left. No sweat off my back.

    6. Re:Facebook does stuff like this a lot by ZekoMal · · Score: 1
      This. I tried to convince three friends to quit FB, and they were vehemently against it.

      Three different reasons given:

      1. I have nothing to hide, so why not share everything with everyone?

      2. My privacy settings are on, so it's okay.

      3. I don't care, I want to keep in touch with my friends that live in the same dorm that I also text obsessively and eat every meal with.

      My generation is as anti-privacy as they are anti-copyright; they hate the establishment but love giving said establishment all of their data.

    7. Re:Facebook does stuff like this a lot by Anonymous Coward · · Score: 0

      Even better... not only do people not leave Facebook, they protest the change in policies by making Facebook groups about how they hate Facebook. (And I think it's safe to say that Facebook doesn't care if those groups exist as long as the users stick around.)

    8. Re:Facebook does stuff like this a lot by CAIMLAS · · Score: 1

      It's probably something to do with the fact that: eh, you can:

      1) leave the site and have them keep all the data, while at the same time not be able to view your friends' profiles again
      2) stay

      --
      ~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
  12. On what grounds? by adam.skinner · · Score: 1

    Legal action? On what grounds, and for what damages? What did this guy have to fear? Jail time? Court imposed fines? He doesn't need a lawyer to defend him in this.

    1. Re:On what grounds? by Anonymous Coward · · Score: 1, Interesting

      This is America, defending yourself in court against a lawyer is legal suicide. I could argue that Cyanide is lethal and Dynamite is combustible in an American Court but if I were up against a lawyer I guarantee I would lose. Despite that these are practically non-disputable facts the American Court System is setup so it is impossible to argue respectably without paying the Lawyer Tax.

      Example:
      1.) I go into court and argue that Cyanide Brand X should carry a "Poison" label.
      2.) Theoretical makers of Cyanide Brand X hire 5 lawyers, because they can.
      3.) Lawyers state as defendant they wish to have a trial by jury (a right guaranteed by the constitution, called a Jury of you Peers)
      4.) Jury selection weeds out anyone with previous knowledge of the effects of Cyanide, and anyone with background in biology or chemistry because they would not be impartial.
      5.) The result is a jury of people who are completely un-knowledgable and as such completely persuadable either way.
      6.) The Lawyers of Cyanide Brand X bring in a variety of "Expert Witnesses" who are of course "compensated for their time" and who state that no Cyanide doesn't kill you.
      7.) Because the Jury is 100% impartial and also 100% uninformed besides what they have been told in court, their only choice is to assume these Paid or Compensated "Expert Witnesses" were correct because they are scientists!
      8.) The result is that you I lost a case arguing what should have been a foregone conclusion to begin with, because somebody brought more money and lawyers than you.

    2. Re:On what grounds? by cdrguru · · Score: 2, Informative

      If your position in entering the above motion was that "I'm right, so I should win" and offered nothing else - such as expert witnesses of your own, you are going to war unarmed. Of course you are going to lose.

      The adversarial system is based on the idea that you have to defend your position. Ranting that "I'm right" doesn't count for much - presenting facts, witnesses, expert testimony, etc. is what counts. And doing so in the proper format for the court.

      You are mostly correct that a lawyer would know these things and how they are done in court. Therefore, yes, almost always a lawyer is required, if for no other reason than to get through the proper procedural format of the court process. You want to do it yourself? You better spend some time learning how it is done, what is required to win and how to get there. Without that education, it is like taking someone that doesn't know computer programming and having them debug a program in an Assembler language.

      Don't have the time to learn all this stuff? Well, that is why we have lawyers.

  13. Robots.txt is insufficient. by way2trivial · · Score: 4, Interesting

    I'm sorry- it is..

    robots.txt allows you to "refuse a specific named bot" or "refuse everyone" or "allow everything" or "allow these directories" or "only allow these directories"
    (want a fascinating read? try robots.txt at your favorite government site- whitehouse.gov used to be fascinating stuff)
    there is no way in robots.txt to permit crawling based on intent of information use like a CC license does

    I can- with photographs, have a creative commons license that sez "use it for anyhting" "use it with credit to me" "free for non-commercial" etc.
    I would WANT google to see my site, I would want bing to see my site- for the purposes of indexing in a search engine.
    I can't say in robots.txt
    "come in and index for search engines and relevance- but you may not use the data to collect information on our membership for marketing to or marketing their info to others"

    If I build a website all about-- coffee- I want the information available to the general public,but from/on my site....

    --
    every day http://en.wikipedia.org/wiki/Special:Random
    1. Re:Robots.txt is insufficient. by truthsearch · · Score: 2, Informative

      So you block all of your content from being indexed by Google? Because Google's also using your content for marketing.

      Also, robots.txt doesn't refuse anything to anyone. It's just a suggestion that any system can ignore. If you don't want systems "seeing" your content, then you must remove your content from the internet or put it behind a wall. A crawler is just another client like a web browser. The internet is intentionally built without discrimination.

    2. Re:Robots.txt is insufficient. by Ksevio · · Score: 1

      User-agent: *
      Crawl-delay: 10

      Sitemap: http://www.whitehouse.gov/feed/media/video-audio

      That's not such an interesting read these days it seems.

  14. You see, Facebook doesn't only control your... by Jalfro · · Score: 1

    personal information - they own it!

    1. Re:You see, Facebook doesn't only control your... by NeutronCowboy · · Score: 2, Insightful

      Someone ought to mod this up. Facebook's only value is in the information you provide to Facebook about who you are, where you live and who your connections are. As a result, they will defend that little nugget as if their life depended on it - because it does.

      --
      Those who can, do. Those who can't, sue.
  15. Like hell he deleted it though. by Anonymous Coward · · Score: 0

    He'll have a few recordable DVDs lying around somewhere to use when FB eventually dies or he thinks enough time has passed to anonymously float the data out on a torrent.

  16. Don't worry... by turbotroll · · Score: 3, Interesting

    Somebody else will do it again, this time anonymously and with an evil robot that hides its tracks. It only takes perl, LWP, MySQL, tor and a little time and imagination to do so.

    Fuck you, Zuckerberg.

    1. Re:Don't worry... by Anonymous Coward · · Score: 0

      That's an awfully specific set of tools. Don't you think you could have gotten your point across without resorting to dropping names of your pet tools?

    2. Re:Don't worry... by Anonymous Coward · · Score: 0

      Well, he didn't mention his favorite versions.

    3. Re:Don't worry... by Rob+the+Bold · · Score: 1

      That's an awfully specific set of tools. Don't you think you could have gotten your point across without resorting to dropping names of your pet tools?

      Sure man. He gets a nickel every time someone uses MySQL.

      --
      I am not a crackpot.
    4. Re:Don't worry... by Anonymous Coward · · Score: 0

      What's wrong with that? - The sooner someone duplicates Wardens work, the better.

      I think he should release his tools, especially his crawler, under GNU or similar license. It violates no laws in itself and releasing it should have the effect of sending hundreds of researches (who don't care about stupid and worthless legal threats) on to Facebooks network and grabbing data (which is also legal as the data is public).

      If Facebook has a problem with that, too bad for them! - Life sucks and then you die... Boo fucking hoo!

  17. You are missing my point by way2trivial · · Score: 3, Interesting

    and I really think it is worth making.

    Copyright protections are important, the snippet of text that google uses to let people know my site is relevant is easily fair use
    I don't have a problem with it- I welcome it as it's beneficial for both myself and google for it to be there.

    the ENTIRE TEXT of my site- copied and recopied to put into a web page that exists only to generate ad-sense revenue by a third party is not.
    and if robots.txt had a 'license' mode, I'd have a much stronger case of protections if I chose to pursue a blatant copying and re-publication of my site.

    robots.txt labels that I wish there were include
    'allow function:indexing'
    'disallow function:total and complete reproduction'
    'disallow function: total and complete reproduction for XXX days'
    (so I can allow wayback machine and equivalents'
    'disallow function: aggregate data collection'
    'disallow function: user data collection'
    'disallow function: email collection'

    looking at amazon, http://www.amazon.com/robots.txt
    they somewhat do this by putting the information they don't want into the wild in it's own directories
    then disallowing those directories- actually, now that I look at it- it's a neat way to go..
    but I'd still prefer a robots.txt option that different 'intended use of data to be crawled' permissions covered

    --
    every day http://en.wikipedia.org/wiki/Special:Random
    1. Re:You are missing my point by thePowerOfGrayskull · · Score: 1

      the ENTIRE TEXT of my site- copied and recopied to put into a web page that exists only to generate ad-sense revenue by a third party is not.

      You mean like google cache? I actually agree with you overall -- it's my data, not yours. You may not publicly exhibit copies of it for your own benefit. It's just that it's a difficult line to draw, in large part because of omnibus monetizing service providers like Google.

    2. Re:You are missing my point by wprowe · · Score: 1

      If using an apache web server, one can use .htaccess to explicitly control content access. That still doesn't discern intent or use of the content.

    3. Re:You are missing my point by Hatta · · Score: 1

      Copyright protections are important

      Copyright is irrelevant here. Facts are not copyrightable. This data from Facebook is no different than the collection of data in the phone book. Republishing a page from Facebook or the phone book is illegal. Republishing facts sourced from those pages is not.

      --
      Give me Classic Slashdot or give me death!
  18. WHAT TOS? by Anonymous Coward · · Score: 0

    Quote: Facebook claimed he had violated its terms of service

    As I understand it the information was openly available and therefore does not require you to use Facebook friend requests to get it. I fail to see how Facebook can impose a TOS on someone who accesses the site but does not use the service.

    Is it assumed I agree to the TOS of Yahoo.com by visiting the frontpage? Is it assumed I agree to the TOS of any website by just visiting, even though they may not have explicitly stated I have agreed to it? If I can make people agree to a TOS without their knowledge than I am going to file a lawsuit against Facebook claiming they owe me $1,000,000 because it is in the TOS right here on my desk about them using my data.

  19. Clue to Pete Warden. by Anonymous Coward · · Score: 0

    Twilight was written by a Morman Author. That's why it shows up in your morman section. Apparently writing a script to scrape facebook profiles is easy research, but not looking up an entry in wikipedia.

    http://en.wikipedia.org/wiki/Stephenie_Meyer

    1. Re:Clue to Pete Warden. by Anonymous Coward · · Score: 0

      Morman? I thought it was written by a Merman!

    2. Re:Clue to Pete Warden. by Rob+the+Bold · · Score: 1

      Twilight was written by a Morman Author.

      Do you mean The Charch of Jesas Chrast of Lattar Day Saants?

      --
      I am not a crackpot.
  20. Haha! by comm2k · · Score: 1

    The most boring of the clusters, the area around Seattle is disappointingly average.

  21. Interesting data by Anonymous Coward · · Score: 0

    Ignoring the legality of it for a moment. What sort of questions can we ask and answer with the facebook data? Look how he has managed to divide the US into groups based on who is friends with who? That's a very interesting way of dividing up a country! StayAtHomeIa. Haha.

    I for one, wish the entire facebook profile database was made public (with personal identifiable information removed). The benefit to researchers would be immeasurable.

  22. RTFA by Chees0rz · · Score: 1

    This is one case I am glad I RTFA. The dataset is destroyed, but there is still a neeto little web application to play with. It's fun to poke around with... I find myself wanting more.

    And of course facebook wanted to shut him down... this is probably data they are collecting themselves and are selling / want to sell :)

  23. "proper" doesn't mean what you seem to think by Anonymous Coward · · Score: 0

    You (and, sadly, many others looking to make a quick buck) seem to think that "proper" anonymization means removal of Personally-Identifiable Information (PII) from the data.

    Removal of PII is neither sufficient nor, in certain cases, necessary for real anonymization. I'll leave the explanatory lecture for my next security class, but a very good rule of thumb for estimating whether an anonymization technique is adequate is whether applying that technique to all documents classified at the Secret level would yield documents suitable for declassification and public release.

    If the anonymization technique you're considering would leave behind information which would require the document to remain classified at the Secret level, then it is not "proper" anonymization.

    This is actually more important and relevant than you might think as post-9/11 more and more security-related Agencies need to find reliable, automatable methods of publishing (only to other Agencies, of course) the non-classified portions of their classified datasets.

    1. Re:"proper" doesn't mean what you seem to think by thePowerOfGrayskull · · Score: 1
      I pretty much agree with you. In fact, I think we're making much the same point. The level of anonymization that is performed -- either by "white hat" hackers, or by the companies who own rights to the data -- isn't sufficient. Removing PII is, as you say, not proper anonymization; and yet that seems to be all that was done with the FB data.

      The common theme in the replies here on slashdot is that the data was "anonymized" so surely there is no harm in allowing the researcher to keep and/or disseminate it. My point is that what passes for anonymization in these cases really isn't.

  24. This is data, not protected by copyright by digitalgimpus · · Score: 1

    I'm not sure copyright law even applies here. No more than it applies to say Google or Yahoo. He scraped DATA from a publicly accessible website as permitted by the robots.txt file. How is this really any different than what Google or Yahoo does? Perhaps the distribution? Though that's hardly significant in this case as the data is already out there. He just organized the presentation. Sounds to me like Facebook just pushing buttons to try and avoid another privacy controversy. /IANAL //Don't use facebook, I'm aware what companies are scraping and misusing what they sniff all too well.

  25. No database copyright by Animats · · Score: 1

    Finding something on the web does not give you the legal authority to publish and redistribute it.

    The US doesn't have "database copyright". The US has Feist vs. Rural Telephone, which says that "facts" can't be copyrighted. It's legal to scan in a phone book and load the address info into a database. You just can't reproduce the page layout; that's covered by copyright. That decision created the third-party phone book industry and began the era of widespread data mining.

    The EULA issue is harder. If you're going to mine Facebook, you probably shouldn't have a Facebook account.

    I'm surprised, though, that Facebook doesn't have systems which prevent programs from accessing pages in bulk.

  26. Statement of Rights and Responsibilities, sec. 3-2 by clone53421 · · Score: 2, Interesting

    You will not collect users’ content or information, or otherwise access Facebook, using automated means (such as harvesting bots, robots, spiders, or scrapers) without our permission.

    An empty robots.txt is not blank-check permission to crawl and use the data for whatever you want.

    --
    Alexander Peter Kristopeit bought his basement from his mommy for one dollar.
  27. Re:Statement of Rights and Responsibilities, sec. by Lithdren · · Score: 1

    Putting a line somewhere on your website doesn't mean it applies to everyone who visits your website.
    *Reading this comment intitles the writer of this comment, to compensation of no less then 100,000 USD per reading
    I'll assume the check is in the mail, by your logic.

  28. Amusing in light of this story by Lunix+Nutcase · · Score: 1

    Has anyone else noticed this new banner at the top of Slashdot?

    Become a fan of Slashdot on Facebook

    It's funny that as much railing on Facebook that is done on Slashdot that Slashdot is advertising for people to become fans of them on Facebook.

    1. Re:Amusing in light of this story by kronosopher · · Score: 1

      That makes sense. It's the only sure-fire way to get FB users here to discover how shitty FB is and delete their accounts.

  29. Re:Statement of Rights and Responsibilities, sec. by Rob+the+Bold · · Score: 1

    You will not collect users’ content or information, or otherwise access Facebook, using automated means (such as harvesting bots, robots, spiders, or scrapers) without our permission.

    An empty robots.txt is not blank-check permission to crawl and use the data for whatever you want.

    But has the guy even signed up? We're not talking the Geneva Convention, here. Could facebook really impose its facebook Constitution on a non member? Sure I understand they'd want to. But wanting and having are two different things, he said, noting the absence of his army of Natalie Portman fembots.

    Do you suggest that this work falls in the realm of unauthorized access? Do you think facebook has specifically authorized Google? There are facebook pages in Google's cache. So does Yahoo! And bing, dogpile, redz . . . Have they really authorized all of these? These sites are certainly not providing their services without an eye to making money off of them.

    But I could be wrong. Every search engine provider could have a deal with every web page that its system crawls . . .

    --
    I am not a crackpot.
  30. Re:Statement of Rights and Responsibilities, sec. by clone53421 · · Score: 1

    You are correct. Simply reading it does not mean that.

    If you plan on caching and reusing the data, however, it does mean that you should check for applicable terms and copyrights.

    If I see a nice picture gallery on a website, I’m welcome to click through and admire the pictures. But if I want to save them and publish them elsewhere, I’d better check the bottom of the page and/or the TOS page for any copyright notices. It’s no different.

    --
    Alexander Peter Kristopeit bought his basement from his mommy for one dollar.
  31. Re:Statement of Rights and Responsibilities, sec. by clone53421 · · Score: 1

    I don’t think it falls under unauthorized access... I think it’s unauthorized use of the information.

    Yeah, it’s a much trickier question since a lot of spiders have implicit authorization to use the information. Googlebot will obviously spider it and index it for Google, and this is such a well-established fact — as is the way to prevent it from doing so by robots.txt — that not actively preventing Googlebot from accessing the page is probably pretty good justification for claiming that you’re permitting Google’s use of that information.

    I’d accept that excuse from any major search engine for not having explicit, personal permission from Facebook allowing it to spider Facebook pages. I wouldn’t accept any excuse like that from some unknown spider that was designed solely to index Facebook profiles.

    --
    Alexander Peter Kristopeit bought his basement from his mommy for one dollar.
  32. Re:Statement of Rights and Responsibilities, sec. by Sprouticus · · Score: 1

    There is no copyright on informaiton. if he had reprocued the entire site you might have an argument. But this is raw data. words and numbers.

    If you have a list of sex offenders in your area on your website, and I go to the web site and cut and paste the list, that is not copyrighted material (or any list really).

    If you have a poem on your website, that can be copyrighted.

  33. Who's "you"? A Straw Man? by mdwh2 · · Score: 1

    Firstly, that's a straw man. Companies use generalised data all the time for marketing purposes. And actually, I'd say you're wrong - typically the response to "privacy" rights over public material is that people have to right to privacy - especially if it's on Facebook!

    Secondly, these aren't mutually exclusive. Perhaps some people might have objected to this guy doing what he's doing, but that doesn't mean that it's right to claim he's bound by some TOS.

    But hey, since arguing against straw men is an easy way to get karma, allow me to say actually, you're wrong, copyright infringement isn't stealing, and Linux is better than Windows.

  34. Re:Statement of Rights and Responsibilities, sec. by clone53421 · · Score: 1

    First of all, even if there is not a copyright on pure information, there can still be a license on its use. You were given the information under the implicit license that you are a web browser and permitted to do what web browsers do: display the information for someone to read, download, etc. If you vastly expand on that functionality or do something altogether different with the information, you are no longer within the implicit license that was given to you when the server gave you the page. Unless perhaps you gave it a User Agent string that is indicative of the fact that you’re nothing it’s ever seen before.

    Second, a lot of the profile information would be considered creative and protected under copyright. Your religion? Not just a drop-down; you are free to write-in whatever you want. This answer could be as generic or creative as you choose. A few of my more creative friends wrote “Following Jesus”, “Jesus is the way, the truth and the life”, and “Jesus died for me AND you!!” (yes, I have that sort of friends). That’s not just information, it’s a short essay response. Your favourite movies? TV programs? same. The profile picture for certain is copyrightable. Name? now you might think this if anything is cut-and-dried, but a few of my friends apparently think that “Name” is an outlet for their creativity as well...

    --
    Alexander Peter Kristopeit bought his basement from his mommy for one dollar.
  35. Re:Statement of Rights and Responsibilities, sec. by Sprouticus · · Score: 1

    You have a valid point about #1, that is a matter of law as I am not sure what consitutes a valid license on the use of information which is in no way limited in its availability. The problem is that even if it violates the ToS, you dont have to accept the ToS to see that data.

    The second part is just wrong. Answering a question in a creative amusing or entertaining manner is not a creative work. Its an answer to a question.

  36. Re:Statement of Rights and Responsibilities, sec. by clone53421 · · Score: 1

    Answering a question in a creative amusing or entertaining manner is not a creative work. Its an answer to a question.

    Yes, it is. The fact is not creative, but the presentation is, and if you simply copy the presentation verbatim, you have violated the creative work.

    “Christian” may be a simple fact and not copyrightable, just like phone numbers and addresses are simple fact and you cannot copyright the phone book. However in most phone books some listings are larger and use graphics, colours, and/or borders to emphasize them; this layout is creative and can be copyrighted. Similarly the phrase that someone writes to indicate that they are Christian can be copyrighted.

    You can’t just cut the binding off a phone book and run it through a duplex copier. You have to scan the pages, eliminate the creative part (the way that the information was presented) and create your own presentation of those facts. Similarly to copy the facts from a social networking profile, to avoid violating the author’s creative work you’d have to sanitize all of the creative portions of the profile. For instance, if you determine from their answer to “Religion” that they are a Protestant, that is a fact and you are free to reproduce it. However the answer itself may very well be a creative work.

    --
    Alexander Peter Kristopeit bought his basement from his mommy for one dollar.
  37. Re:Statement of Rights and Responsibilities, sec. by Sprouticus · · Score: 1

    You are absolutely correct.

    But that is not what he did. He took the raw data from FB profiles (public ones, not proviate ones) and then used the raw data (not the presentation) to data mine interesting information.

    There is NOTHING copyrightable about that raw data. Take a look at the links above for his article on the 'zones' in the US. Its actually quite facinating from a sociological stand point.

    My point is simple. FB had no right to threaten copyright on the data. If he had repproduced the pages en masse sure that would be a violation. But the data is NOT.

  38. Re:Statement of Rights and Responsibilities, sec. by clone53421 · · Score: 1

    My point was that what you are calling “raw data” was in fact the copyrightable presentation of raw data.

    If you scanned the pages of the phone book, digitally cropped out the listing for each number (including colours, fonts, graphics, and borders for any listings containing those), re-alphabetized and re-printed those exact duplicates of the listings at 200% for low-sighted individuals (supposing the actually arrangement on the page would be completely different, since you didn’t necessarily make the paper and margins 200% larger as well)... you’d still be violating the creative work of the original publisher.

    And if you took the words verbatim, sliced out the answers to the questions on their profile, and stored those in a new database... you’d be violating the creative work of the people who wrote those answers.

    What is or is not a creative work may be debatable on any small, single snippet. However if you store, verbatim, millions of people’s answers to the questions on their profiles, are you violating someone’s creative work? Without question you are because even if most of them cannot be considered creative, some of them are creative.

    The generic, Times New Roman 10 pt listing with no fancy colours, borders, or font styles? Not creative. The guy who listed “Christian” under his religious views? Not creative.

    The listing with an eye-catching graphic, border, colours, and larger font size? Creative. You could extract the information from it, discarding all of the creative features, and you’d be fine, but you can’t make an identical copy of the listing. The guy who wrote a short paragraph under his religious views? The same: you can distill out a factual answer from his creative response, but duplicating the response verbatim would violate his authorship of a creative work.

    --
    Alexander Peter Kristopeit bought his basement from his mommy for one dollar.
  39. It is publicly available by thetoadwarrior · · Score: 1

    I fail to see how he did anything wrong. If FB doesn't like it then they can change how their site works.

  40. See CLONE THE TROLL, screwing up hugely by Anonymous Coward · · Score: 0

    See subject, and then see troll Clone do so -> http://slashdot.org/comments.pl?sid=1591778&cid=31703134

    Utterly hilarious - Clone opened up his piehole & now he can't back up his pure b.s.!

  41. See CLONE THE TROLL grovelling, lol by Anonymous Coward · · Score: 0

    See subject, and then see Clone do so -> http://slashdot.org/comments.pl?sid=1591778&cid=31703134

    Utterly hilarious - Clone opened up his piehole & now he can't back up his pure b.s.! He avoids answering questions at ANY cost, lol, when he knows he's f'd up here... hilarious!

  42. Facebook and the likes is evil. by Anonymous Coward · · Score: 0

    they are scared of you knowing the truth.

  43. Maybe he did ... by dougmc · · Score: 1
    It's entirely possible that he did contact the EFF.

    But the EFF can't fight every battle -- they go after the land-breaking ones, the ones that will have the highest benefit/cost ratio. It's not clear that this is such a battle.

    1. Re:Maybe he did ... by shentino · · Score: 1

      If we had a loser pays system the EFF would be able to fight a lot more battles without running its treasury down with non refundable legal expenses.

  44. Re:Statement of Rights and Responsibilities, sec. by xenobyte · · Score: 1

    An empty robots.txt is not blank-check permission to crawl and use the data for whatever you want.

    No, but it's not a ban either.

    Common sense dictates that if data is publicly accessible and not accompanied by a specific usage limitation, you can mine the data and use it for scientific purposes as fair use. This guy did not charge for his results, nor for the compiled data, so it was textbook fair use.

    Remember, he did not use the collected data directly but only the relationships it inferred. That information is the product of the crawlers compilation, not the data itself, and only the data itself can be copyrighted. It's just like the fact that you cannot copyright the mood a certain piece of music or movie puts you in, only the music or movie itself. The mood is the product of an interpretation of the music or movie, and while it may be an intended result, it is still not a part of the music or movie itself.

    If only... I could copyright sappy lovesongs... Profit!!

    --
    "For every complex problem, there is a solution that is simple, neat, and wrong." -- H.L. Mencken (1880-1956) --