Facebook Kills Dataset of Crawled Public Profiles
holy_calamity writes "Internet entrepreneur Pete Warden wrote a crawler that collated the public profiles of 210 million Facebook profiles and was set to release an anonymised version to researchers. The pages crawled can be read by any web user, and the robots.txt did not forbid crawling. However, Facebook claimed he had violated its terms of service and threatened legal action. Fearing costs, Warden has now destroyed his dataset. For a snapshot of the insights that data could have allowed, see Warden's post on how the friend networks of the 120 million US users in his data segregated into seven clusters." Of course, if he had it, this means anyone who wants it made their own version of this.
I see very little problem with an automated scan that respects robots.txt.
By not blocking automated access to the profiles, facebook is squarely at fault.
Don't see Facebook going after Google, even though the data that they posses is ostensibly the same as Warden's. The primary diff that i see is that warden was offering analysis and results for free- not trying to monetize it. Maybe that's what made them mad.
why do you think they threatened him? they want to sell this data themselves.
"In America, first you get the sugar, then you get the power, then you get the women..." -H. Simpson
They're not wrong though. People on FB constantly get outraged at new policies, interfaces and features, but I don't know of anyone who has actually left the site. I am just as bad myself; all I've done is remove everything from my profile and just use it as a hub to stay in contact with people all around me, I haven't gone as far as stopping using the site, and I don't think I will. Nor will many people.