Over 40% of New Mechanical Turk Jobs Involve Spam
An anonymous reader writes "An NYU study reveals that over 40% of the jobs posted by new employers on MTurk are some sort of spam request, such as fake account creation, fraudulent ad clicks, or fake comments, tweets, likes and votes. The study also shows that the bad jobs could be automatically filtered with 95% accuracy, but Amazon is not interested."
I guess you really can't build a robot shill.
The article says "when we informed them about the issue. They pretty much assured us that everything is fine," Can we have some kind of quote from Amazon or do we need to take the vague interpretation at face value before drawing the articles conclusion that "Amazon is not interested".
http://en.wikipedia.org/wiki/The_Turk
"We look forward to continuing to serve our AWS customers and are excited about several new things we have coming your way in the next few months."
Well, I'm looking forward to you confirming the deletion of my account I requested a week ago. And that 2nd part sounds like a threat.
I hope I didn't brain my damage.
How will Hormel ever recruit any new employees?
So when 40% of their MT service usage is contrary to the ToS, everything's fine and dandy.
But when Wikileaks is in full compliance with the ToS of their EC2 service, they get the boot?
I had to look this up.
Amazon Mechanical Turk (beta)
Amazon Mechanical Turk is a marketplace for work that requires human intelligence. The Mechanical Turk web service enables companies to programmatically access this marketplace and a diverse, on-demand workforce. Developers can leverage this service to build human intelligence directly into their applications.
While computing technology continues to improve, there are still many things that human beings can do much more effectively than computers, such as identifying objects in a photo or video, performing data de-duplication, transcribing audio recordings or researching data details. Traditionally, tasks like this have been accomplished by hiring a large temporary workforce (which is time consuming, expensive and difficult to scale) or have gone undone.
Mechanical Turk aims to make accessing human intelligence simple, scalable, and cost-effective. Businesses or developers needing tasks done (called Human Intelligence Tasks or “HITs”) can use the robust Mechanical Turk APIs to access thousands of high quality, low cost, global, on-demand workers—and then programmatically integrate the results of that work directly into their business processes and systems. Mechanical Turk enables developers and businesses to achieve their goals more quickly and at a lower cost than was previously possible.
Drill baby drill - on Mars
So, would the filtering of bad services from MTurk be performed using MTurk?
Same reason the USPS likes bulk mailers... they keep the operation afloat. Especially as more and more people turn to email.
"I will trust Google to 'do no evil' until the founders no longer run it." Hello Alphabet.
I know a few research scientists who use the Turk for some awesome ideas (it's a LOT cheaper than in-person human subjects and the people you get aren't homeless, drunks, or freshman psych students fulfilling requirements). However, there is little money in (non-military) basic research at the moment, and only a fraction of that even requires human subjects.
The rest is merely a new breed of on-demand advertising and promotion. Amazon is still getting paid, so they likely don't care. I'd argue that if they don't want to squash the problem altogether that they should at least isolate it to grant people an easier time in going wherever they were heading, e.g. "help me solve vision" versus "help me get popular"
Use my userscript to add story images to Slashdot. There's no going back.
Did anyone else notice that the summary says 95% accuracy but doesn't break it down to False Accept and False Reject?
Not to mention, spammers adapt. That's the main problem with them.
How do we know that some of that spam is not from amazon itself?
"Accuracy" is a difficult measure to quantify. I see from reading the article that the accuracy has been estimated at 95% due to a a 95% true positive rate and a 95% true negative rate. Given that the current spam rate is 40%, these rates aren't particularly bad, but Amazon would still have quite a few problems with angry customers. Assuming 1500 HITs per day, and 60% of those non-spam submissions, 45 would be falsely flagged as spam.
Ask me about repetitive DNA
"The study also shows that the bad jobs could be automatically filtered with 95% accuracy, but Amazon is not interested"
is like saying spam can be filtered at the same rate, but we all know how that works...
This just got me thinking. Could the service be used to game the App Store? Currently, there are several companies offering to get any free App into the top 25 list for $5000. It's widely believed that they use bots to do it, but it could just as easily be mechanical turks.
That data is from two months back, before Google Places appeared in web search. Now, it's worse. There's a whole mini-industry in the "black hat" search engine "optimization" community creating phony Google Places entries. Here's an ad on Mechanical Turk today:
Google Places spamming hasn't been fully automated yet, so we get to watch spammers outsource their manual spamming. Spamming Google Places is incredibly easy, much easier than creating the link farms required to spam Google's old web search. See the instructions in "Dominating Google Maps- The Most Effective Spam Ever And What You Can Learn From It".
Google Places has been 0wned.
I find it interesting that the people placing the HITs have to decide whether the work done is good quality and then decide to pay or not. So that means for each tiny job you farm out, you have to do your own tiny bit of make work to decide whether to pay or not. Can you farm this out on the turk too? If not, maybe there's a market for a service that let's you do so...
http://lkml.org/lkml/2005/8/20/95
From my time exploring mturk I would have guessed it to be much higher than that, non-spam related jobs were definitely the minority of what I saw.
The creepiest (and highest paying) job I saw though involved watching surveillance footage from airports, making sure the automated face tracker stayed on target...
This is a fake comment. A real one would have looked different.
I'm surprised you weren't modded up.
I'm also surprised at how low the wages are at this turk thing, when a 15+ page script needs to be read just to get started. In the US, centuries of constant demolition and rebuilding mean that house numbers easily jump from "1" to "21" when maybe 10 houses on the same block no longer need individual numbers after the block turns into a single vacant lot.
Marketting fake numerical addresses in between legit ones ensures that Google Pagerank rates your "unique" business as #1 for certain keywords that only the inexistent address owns. When I learned this 2 weeks ago, I thought spammers had to at least sweat through that manual task by themselves... now? what a bummer!
I hope the folks at Google start trolling the same MTurk job listings to mark down location spam for what it is...
I'm also surprised at how low the wages are at this Turk thing. ... I thought spammers had to at least sweat through that manual task by themselves.
It's like $0.25 per human-generated spam. Automation seems to be coming. I'm seeing mentions on black hat SEO forums that an automated tool for doing this in bulk will be released early next month.
Marketing fake numerical addresses in between legit ones ensures that Google Pagerank rates your "unique" business as #1...
Sometimes. That technique is mostly used to give real businesses extra bogus locations. Check out "New York City locksmith", for example. Other heavily spammed terms are "carpet cleaning" and "divorce lawyer".
This week's new technique is described at "How To Spam Google Maps For Top Google Place Listings". This is like SQL injection for mailing addresses. The trick depends on Google's parsing of mailing addresses from the top, while USPS standards say they should be parsed from the bottom line upward. So a mailing address with two street addresses is parsed differently by the USPS and Google, allowing the spammer to redirect Google's confirmation postcard to some mail drop.
Google seems to be out to lunch in this area. The same exploits have been working for months. Yet Google doesn't list any such issues under "Known Issues. Over on Matt Cutts' blog, where you'd expect to see some discussion of this, he reports that he's writing a novel.
It's even worse at Bing. Bing emulated Google's October 27th merger of Places into web search within a few days. But they weren't ready. Look up "New York City locksmith" in Bing, and the five "Places" entries are all the same business.
- Like, fuck that for a start. Seriously..
Doubtless a good idea in there, somewhere, howevah (Gramma dont need to B perfect, just enter your email here, etc.) - wading through a veritable Dantes Inferno of Spam Monkeys Inc. in search of an entertaining way of earning 35c for 30 minutes work remains, long term, No Future 4 me.
Christ, but there is some serious crap in there - whoever brought my attenton to this - Thanks 4 $haring! - feel like I need a shower in strong alcohol now.
Too expensive. They should outsource it to Mechanical Turk...
"I have been offered the online-perception-management services I'm talking about while managing at HP and Sourcelabs. If you are not aware of companys concern for their online perception and what they do about it, and won't take my word for it, there isn't much point in arguing about it with you." - by Bruce Perens (3872) on Friday July 30, @09:27PM (#33092398) Homepage Journal
SOURCE -> http://linux.slashdot.org/comments.pl?sid=1738364&cid=33092398
and
"It just takes one Ubuntu sympathizer or PR flack to minus-moderate any comment. Unfortunately, once PR agencies and so on started paying people to moderate online communities, and to have hundreds of accounts each, things changed." - by Bruce Perens (3872) on Friday July 30, @03:55PM (#33089192) Homepage Journal
SOURCE -> http://linux.slashdot.org/comments.pl?sid=1738364&cid=33089192
(So given that - Do you think that "fake accounts" and "spam" or "technically unjustified mod downs" & the like in trolling posts of others that may "threaten the 'powers-that-be'" etc./et al doesn't happen on /. as it does every place else? Guess again, per the above...)
APK
P.S.=> It's not just "malware makers" or "spam mailers" folks - Mr. Bruce Perens is also showing you that it's also done in the name of big companies as well, via "paid for trolls" in the big name companies' hire... apk
This is the first time that I hear about this service... And wages are low indeed. Hell, I could sacrifice one hour of my wages and get someone to do 10 hours worth of work for me (okay, assuming I wouldn't have to pay any taxes. Perhaps 6 hours worth now. But anyways). I don't know what kind of a project I could use this for but now that I heard that a service like this exists, I'll probably try to come up with some neat way to try.
I guess that the "all publicity is good publicity" mantra does hold true, here.
As for the "Who'd work for those wages?" question... Not everyone with an intenet access lives in the west. Even ignoring the worst shitholes and looking at a country like Russia, we have average wage at something like 500 a month (and less than double that for educated people such as engineers)...
well it would certainly be interesting if google sits on this then drops the hammer on everyone that did it,
Snowden and Manning are heroes.