Massive Tinder Photo Scrape Has Users Upset (techcrunch.com)
Images of Tinder users "were swept up in a massive grab of some 40,000 photos from the dating app by a dataset collector who plans to use the selfies in artificial intelligence training," writes Slashdot reader Frosty Piss, sharing this summary of a report in TechCrunch.
Tinder said in a statement that the photo sweeper "violated the terms of our service" and "we are taking appropriate action and investigating further." The creator of the data set, Stuart Colianni, has released it under a CC0: Public Domain License and also uploaded his scraper script to GitHub.
He describes it as a "simple script to scrape Tinder profile photos for the purpose of creating a facial dataset," saying his inspiration for creating the scraper was disappointment working with other facial data sets. He also describes Tinder as offering "near unlimited access to create a facial data set," and says scraping the app offers "an extremely efficient way to collect such data."
The article notes that Tinder's API has already been used for other "weird, wacky, and creepy" projects, including "hacking it to automatically like every potential date to save on thumb-swipes; offering a paid look-up service for people to check up on whether a person they know is using Tinder; and even building a catfishing system to snare horny bros and make them unwittingly flirt with each other.
"So you could argue that anyone creating a profile on Tinder should be prepared for their data to leech outside the community's porous walls in various different ways -- be it as a single screenshot, or via one of the aforementioned API hacks. But the mass harvesting of thousands of Tinder profile photos to act as fodder for feeding AI models does feel like another line is being crossed."
He describes it as a "simple script to scrape Tinder profile photos for the purpose of creating a facial dataset," saying his inspiration for creating the scraper was disappointment working with other facial data sets. He also describes Tinder as offering "near unlimited access to create a facial data set," and says scraping the app offers "an extremely efficient way to collect such data."
The article notes that Tinder's API has already been used for other "weird, wacky, and creepy" projects, including "hacking it to automatically like every potential date to save on thumb-swipes; offering a paid look-up service for people to check up on whether a person they know is using Tinder; and even building a catfishing system to snare horny bros and make them unwittingly flirt with each other.
"So you could argue that anyone creating a profile on Tinder should be prepared for their data to leech outside the community's porous walls in various different ways -- be it as a single screenshot, or via one of the aforementioned API hacks. But the mass harvesting of thousands of Tinder profile photos to act as fodder for feeding AI models does feel like another line is being crossed."
well, i like it
Dairy cattle are upset about being milked.
Sheep are upset about being shorn of their wool.
Now all you users of Tinder, Facebook, Google+, iCloud, etc. need to line up and say "BaaaAAAAaaaa".
You brainless goddamned chumps. You deserve to be milked because you're so fucking stupid.
Looks like everyone is gone to tinder to remove their profile photo ^_^
Tinder said in a statement that the photo sweeper "violated the terms of our service" and "we are taking appropriate action and investigating further."
TOS is meaningless in cases like this. TOS are meaningless anyway, except as, perhaps, a means to ban users. And that's pretty pointless as well.
But really, what do people that put their photographs out on the Intertubes like this expect? Privacy? Really?
If you want news from today, you have to come back tomorrow.
The data wants to be free!
I'm also an AI researcher. If I need a face dataset I could either use CelebA or the Facebook API to scrape user profile pictures. There's also a mugshot database/public DoJ and County jail mugshot API's so there's also that.
Now with "GAN" generative models, there's very little need for large datasets unless the existing datasets are biased in some way.
Let's get real here: someone wanted to build a Deep NN classifier for sexual promiscuity. Other than attention whoring, that's the only reason to harvest tinder users specifically.
Grindr would do well to tighten their hatches. Training a NN to classify "heterosexuality" from their userbase is the next natural progression. Perfect for a homophobic witch hunt in 3rd world countries. Will I go to hell if I sold such an app to a the Middle East law enforcement agency? Doesn't matter if it works as long as you can demonstrate efficacy on the training data.
Their purchasing agents are unlikely to be sophisticated enough to understand the importance of "hold out data", so it wouldn't be hard to put together a demo with near perfect accuracy.
Putting photos out where anybody can see them means putting photos out where anybody can see them.
I was thinking about making an autoliker that only liked attractive people using machine learning, and learn neural networks while at it. This dataset will come in handy.
"The article notes that Tinder's API has already been used for other "weird, wacky, and creepy" projects, including "hacking it to automatically like every potential date to save on thumb-swipes"
Where is this? Please, I need it!
As the www evolved, I did not think it would turn into the creepy, kooky, cluster-fuck of confusion that is the modern 'social' internet. It's what we are stuck with now though in these modern times.
Maybe don't forget there is a real 3 dimensional space time in which we all really exist as human beings and live together on planet earth.
Seems like people who had selfies scraped could file DMCA takedowns as they would own the copyright to the images.
Really though, is this surprising? Seems like one could get most of these images from Facebook directly anyway.
I can see downloading for research purposes as being ok. And I can see developing the algorithms as being ok. I can even see uploading the algorithms as being ok.
Now all of the above is predicated on not violating the terms the "researcher" agreed to if/when she signed up for the account he used. Assuming an account was required.
But uploading the photos taken somewhere else for public consumption is just wrong.
Abuse of privileges is how we get to the point we find ourselves many times in society. This breech of the public's confidence is just another stab in the back to a society that values respect.
Caution: Contents under pressure
The article links this as being the dataset "consist[ing] of six downloadable zip files, with four containing around 10,000 profile photos each and two files with sample sets of around 500 images per gender."
https://www.kaggle.com/scolian...
Which gives a 404.
If you're a zombie and you know it, bite your friend!
:P Sorry for the title.
But really, to the people complaining about this, ALL of your publicly accessible photos are entirely subject and probably already into "massive photo scrape". Tinder is saying they'll do "something" about it just because you know, PR speak, but they can't do much other than banning accounts which did it... which pretty much ammounts to nothing.
This also could easily be done with any social network profile photos. Any service which you can easily create a profile and go searching for people in a programmable manner is subject to this.
Would the same people who "allow it" because the photos are publicly accessible also condone cheating in online games because the games don't prevent it? Tinder is not my kind of thing, but I understand that there is an expectation that the profiles are not globally browseable by anyone. And even if they were, do you really want an internet where nobody publishes anything in public anymore because everything will be harvested by a bazillion scrapers. That's an internet without search engines, with millions of tiny walled gardens that you have to be a member of and search individually. There needs to be a distinction between making something public for certain purposes and giving up all control.
Putting aside all the victim blaming for a second...
This is meant to be a private (closed-source) application, with a private API interacting to the private server.
Why the hell can anyone (read: unauthenticated users) access private data via a public and unrestricted URL? I've read articles reverse engineering their API. It's terrible! This is another company who did not put enough time and effort into securing the application and API, and now users (read: non-technical, real people, some of which paid money, all of which trusted the company) are left exposed.
I really wish there was a way to force companies (ie: legislate) to place far higher importance on this. I've also been in situations where, as a developer, I've had managers scuttle or ignore requests to lock things down, in the interests of deadlines or cost or worse yet, "we'll fix it once it's up and running."
I've personally made a few bucks scraping the entire use database of a bar, online golf game that ran tournies. For their competitors.
Just don't advertise what your doing and nobody cares.
I never had an account, never agreed to TOS. Zero security, enter acct name, data came up. Just hammered the site, brute force.
John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'
Pics or it didn't happen.
Fuck me, this guy is a full blown autist.
He used Python, so he's probably just a mild aspie. If he had been a full-blown autist he would have used ruby.
lucm, indeed.
And was almost expelled for FaceMash
I bitch and moan about Facebook and AI. After all, I am just a "no social life, crazy" Linux user. You tell people how stupid it is to use anything related to Facebook and that AI is nothing more than a tool to kill encryption and anonymity, and people get angry; I'm sure Slashdot keeps a record of my comments for you to confirm my rantings on this. And not to take credit, for many others have made this point as well. I know Slashdot has a Facebook and there will probably be a bunch of moderating in the next 12 hours. And before the AC's decide to flood the rest of the space with "Leftest and Rightest" mumbo jumbo to distract everyone, or FB releases damage control articles, I do not mind being that asshole at all to say all of us told you so. I hold my cardboard sign proudly.
Well aspie or not, he's not wrong.
The guy's a sociopath. That's not the same as autism.
No link to the Github project? C'mon slashdot, why do I even bother with you?
You can only get matches if you use boost these days and depending on which Plus account they decide to put you on, that could be 1 boost per week or 1 per month and at either $10 per month or $20. They are also expensive if you buy them directly instead of using Plus and only last 30 minutes. I think more than half of the accounts on it are bots or inactive accounts, no way to tell the latter since they don't tell you the last time someone was on anymore. You can use the 1 free superlike each day (or 5 if using Plus) to act as a boost on one person, I just think most women end up taking those the wrong way, even if they liked back, so it does more harm than good unless you were already really hot in the first place.
So... has anyone actually seen the dataset? And can you make any comments about it?
-- I was raised on the command line, bitch
Well the api's allowed this to happen, which makes me think the TOS is at best an afterthought. As for users, you posted that stuff, based on history what the hell did you think would happen. Either get over it or stop behaving so irresponsibly.
Shiver me Tinders?
A concerted effort can and should be made to make this person's Family and Employer aware of his actions.
Well, it is.