Researchers Release Profile Data on 70,000 OkCupid Users Without Permission (vox.com)
An anonymous users shares a Vox report: A group of researchers has released a data set on nearly 70,000 users of the online dating site OkCupid. The data dump breaks the cardinal rule of social science research ethics: It took identifiable personal data without permission. The information -- while publicly available to OkCupid users -- was collected by Danish researchers who never contacted OkCupid or its clientele about using it. The data, collected from November 2014 to March 2015, includes user names, ages, gender, religion, and personality traits, as well as answers to the personal questions the site asks to help match potential mates. The users hail from a few dozen countries around the world. The researchers, Emil Kirkegaard, Oliver Nordbjerg, and Julius Daugbjerg ran software to "scrape" the information off OkCupid's website and then uploaded the data onto the Open Science Framework, an online forum where researchers are encouraged to share raw data to increase transparency and collaboration across social science.
The data was already public! What are you whining about?
Did you know that Jews carried out the 9/11 attacks with the backing of Israel? Zionists have conspired to suppress this information and blame Muslims. It's obvious listening to recordings that the attackers had Israeli accents. The trail of money leads back to Mossad. Can anyone provide any real evidence to disprove these facts?
It it's information that's publicly available, what's the problem? That is not "taking identifiable data without permission." Making it public IS permission.
Don't want to know something? Don't make it available to all and sundry on the internet. Sheesh.
"Transparent" is a shit show that trades on every stereotype going. A man in drag is NOT a transsexual.
I'm not going to name any names, but *several* Slashdot users appear not to be able to read summaries with any degree of accuracy - the data is not public, but only AVAILABLE TO OkCupid USERS (yes, that is what the summary actually says).
*Very* important distinction.
"There is more worth loving than we have strength to love." - Brian Jay Stanley
This record has been suspended"
https://osf.io/p9ixw/files/
Kirkegaard's other work (still available) on Open Science Framework: https://osf.io/a2yfn/
Interestingly enough, it works out to be great advertising for a really neat science site/service...
Gave OkCupid data to get laid.
Now it appears that I'm fucked if anyone looks at this data and figures out which userid is mine.
"cardinal rule of social science research ethics". If that isn't enough a red flag, I don't know what is. Anyone who puts that much time in a concept surely is out to fuck someone.
1. Because there is a distinction between data-mining user information and browsing user profiles as an individual.
2. Because the person did not hold a copyright in any of the material which he scraped and uploaded to another site. The terms of service at the second site require him to only upload material he has a right to upload. He violated their terms of service. I am sure that is why the material is now down. https://osf.io/p9ixw/
Interestingly, though, okcupid's /profile is not blocked in their robots.txt.
Real lawyers write in C++
If you wouldn't say it to a person you have recently met it should not be online. That being said this is still a crime. As a member I am moderately upset.
Making it public is not posting it to a site.
However, how is it a problem for a company to post that if it's not personally identifiable?
How could 70,000 people be identified out of the population of the English speaking world? Maybe they have other languages, if so include that I do not know the site personally.
The point is the same, how could those data points personally identify anyone, I don't think that they can and anyone thinking they can needs to prove it from the bottom up.
from TFA: "The data dump did not reveal anyone's real name."
Usernames, etc, were revealed. A clever person might be able to find the true owner of an account if it was really important to him/her. Time will tell if any puppies were injured by this action.
...omphaloskepsis often...
Emil Kirkegaard is a total scumbag.
Gosh I hope I cancelled my account before then.
---- The above post was generated by the Turing Institute. Maybe.
Last year some jackass named John Greenewald Jr. ripped off a small open source software project designed to data-mine FOIA websites that scan and digitize historical documents. Amusingly the public reaction was completely reversed from how people are flipping out about the OkCupid data theft.
Despite the fact that this loser of a human being, John Greenewald, ripped off hundreds of thousands of documents, and never uttered a single word about where he got the data, or how he ripped off an open source software team that had been developing the project for years, nor did he give attribution to the company that did the actual work of scanning the documents—and made them available for free no less. On top of all that this complete parasitic loser even had the balls to try monetize Fold3's work. Yet hilariously people still have the temerity to attack Fold3 and Ancestry.com claiming the company was somehow in the wrong for forcing this attention-whoring sideshow clown to remove the data from his website or face a lawsuit.
The sad truth is people don't care about the actual morality of data theft. They only care about whether or not the data is personally beneficial to them, and if it is, well, ... then it's okay.
You can be fired and unofficially blacklisted. Academia can be more political than D.C.
No, we don't want administrative punishments.
What kind of banana republic do you live in...?
If there is a breach of law, it should be handled in court, or perhaps a civil settlement will do in this case...
I dislike it when people think companies or educational institutions should take the law in the their own hands, and punish students/employees for what they do in their private lives.
Sounds like the students who did this, might not have thought about all the consequences.
As for whether or not it's legal... That is hard to say, technically copyright has a lot of exceptions when it comes research and education.
That said, a court could also rule that students could do research without publishing the raw data; and that therefore privacy outweighs research and education exceptions in this particular case (because the data is particularly sensitive).
Regardless, it is not for the University to punish students for spare time activities. A University cannot acts as prosecutor, judge and jury. We have courts for that!
I can't speak to legality of the researcher's actions, but as a Social Scientist (cue jokes about not being a real scientist), I can tell you that their actions were unethical. Specifically, I'm shocked that their Internal Review Board (IRB) thought it was ok to upload this data to a forum where all can have access.
Social Scientists, when conducting research, are under a moral obligation to make sure that their participants are not under more than 'minimal risk' as a result of the research. The most common heuristic for that minimal risk is whether the researchers are making the participants susceptible to more risk than they would normally be susceptible to. In this case, while the participants had provided data to a semi-public forum (i.e. OkCupid), make the data more easy to extract and able to be mined is definitely putting the participants at higher risk for data related crimes (e.g., identity theft, bank fraud).
If those researchers aren't in proverbial hot water yet with their institutions, they will be when the law suits come. The lesson to be learned here if you are a researcher....your IRB exists for a reason; check with them before creating a new protocol.
http://emilkirkegaard.dk/en/
OSF has now suspended the entire repository, not just deleted the user datafile. Not sure why this is the case. So for now, the paper PDF will be available here: OKCupid_public_dataset_paper Edited to add: The repository is closed due to a DMCA request sent by OKCupid which is currently being investigated.
A good use of the DMCA in this case IMO. (Though surprised it worked overseas.)
I need it for ... er ... research purposes.
Yeah, all these stories without a single link to the data? Idiots. Where's the link?
Any idea why the US gov isn't suing them?