Slashdot Mirror


Facebook Kills Dataset of Crawled Public Profiles

holy_calamity writes "Internet entrepreneur Pete Warden wrote a crawler that collated the public profiles of 210 million Facebook profiles and was set to release an anonymised version to researchers. The pages crawled can be read by any web user, and the robots.txt did not forbid crawling. However, Facebook claimed he had violated its terms of service and threatened legal action. Fearing costs, Warden has now destroyed his dataset. For a snapshot of the insights that data could have allowed, see Warden's post on how the friend networks of the 120 million US users in his data segregated into seven clusters." Of course, if he had it, this means anyone who wants it made their own version of this.

8 of 158 comments (clear)

  1. Very interesting by Bearhouse · · Score: 2, Informative

    I'll let others debate the 'privacy' issues; (personally I think there's nothing wrong with scraping profile information that people have explicitly made 'public')
    Anyways, just check what he did with it; very interesting: (FTA)
    http://petewarden.typepad.com/searchbrowser/2010/02/how-to-split-up-the-us.html
    There must be many, many legit uses this data could be put too...shame it's being killed by NIH syndrome

  2. Re:For an Interesting Exercise in Head Asplosion by Tobor+the+Eighth+Man · · Score: 3, Informative

    Not really a meaningful distinction, as contract law is very much an aspect of the law. We can bicker about whether terms of service are enforceable and to what extent, but the reality is that this guy has better things to do than wage a complex and almost certainly protracted legal battle against a corporation.

  3. Re:Robots.txt is insufficient. by truthsearch · · Score: 2, Informative

    So you block all of your content from being indexed by Google? Because Google's also using your content for marketing.

    Also, robots.txt doesn't refuse anything to anyone. It's just a suggestion that any system can ignore. If you don't want systems "seeing" your content, then you must remove your content from the internet or put it behind a wall. A crawler is just another client like a web browser. The internet is intentionally built without discrimination.

  4. Re:For an Interesting Exercise in Head Asplosion by Rantastic · · Score: 2, Informative

    Finding something on the web does not give you the legal authority to publish and redistribute it.

    Nonsense.

    Allow me to call your attention to Fair use, a doctrine in United States copyright law that allows limited use of copyrighted material without requiring permission from the rights holders, such as for commentary, criticism, news reporting, research, teaching or scholarship.

    Of course, none of that is actually relevant as Facebook is not making a copyright claim. They are claiming he violated their terms of use. I just scanned it and the only seemingly relevant text I can find is

    If you collect information from users, you will: obtain their consent, make it clear you (and not Facebook) are the one collecting their information, and post a privacy policy explaining what information you collect and how you will use it.

    --
    Ask Slashdot: Where bad ideas meet poor googling skills.
  5. Re:Yes, by all means, let's stamp out... by thePowerOfGrayskull · · Score: 2, Informative

    Removing names isn't necessarily enough. The recent netflix case shows that . I think it's interesting that nobody catches the broader implications of that discussion -namely that whether they're "anonymizing" data for purposes of providing it for research, or selling it for marketing... the ability to reverse engineer patterns to undo it remains a risk. -

  6. Re:For an Interesting Exercise in Head Asplosion by crashumbc · · Score: 2, Informative

    unless something has changed, you have to "login" to see anything in Facebook. Even if a page is "public" you can't view it without logging in with your own account.

    A crawler may or may not by pass that...

  7. Re:On what grounds? by cdrguru · · Score: 2, Informative

    If your position in entering the above motion was that "I'm right, so I should win" and offered nothing else - such as expert witnesses of your own, you are going to war unarmed. Of course you are going to lose.

    The adversarial system is based on the idea that you have to defend your position. Ranting that "I'm right" doesn't count for much - presenting facts, witnesses, expert testimony, etc. is what counts. And doing so in the proper format for the court.

    You are mostly correct that a lawyer would know these things and how they are done in court. Therefore, yes, almost always a lawyer is required, if for no other reason than to get through the proper procedural format of the court process. You want to do it yourself? You better spend some time learning how it is done, what is required to win and how to get there. Without that education, it is like taking someone that doesn't know computer programming and having them debug a program in an Assembler language.

    Don't have the time to learn all this stuff? Well, that is why we have lawyers.

  8. Re:For an Interesting Exercise in Head Asplosion by clone53421 · · Score: 2, Informative

    They are claiming he violated their terms of use. I just scanned it and the only seemingly relevant text I can find is

    Here.

    --
    Alexander Peter Kristopeit bought his basement from his mommy for one dollar.