Over 40% of New Mechanical Turk Jobs Involve Spam
An anonymous reader writes "An NYU study reveals that over 40% of the jobs posted by new employers on MTurk are some sort of spam request, such as fake account creation, fraudulent ad clicks, or fake comments, tweets, likes and votes. The study also shows that the bad jobs could be automatically filtered with 95% accuracy, but Amazon is not interested."
"We look forward to continuing to serve our AWS customers and are excited about several new things we have coming your way in the next few months."
Well, I'm looking forward to you confirming the deletion of my account I requested a week ago. And that 2nd part sounds like a threat.
I hope I didn't brain my damage.
So when 40% of their MT service usage is contrary to the ToS, everything's fine and dandy.
But when Wikileaks is in full compliance with the ToS of their EC2 service, they get the boot?
I had to look this up.
Amazon Mechanical Turk (beta)
Amazon Mechanical Turk is a marketplace for work that requires human intelligence. The Mechanical Turk web service enables companies to programmatically access this marketplace and a diverse, on-demand workforce. Developers can leverage this service to build human intelligence directly into their applications.
While computing technology continues to improve, there are still many things that human beings can do much more effectively than computers, such as identifying objects in a photo or video, performing data de-duplication, transcribing audio recordings or researching data details. Traditionally, tasks like this have been accomplished by hiring a large temporary workforce (which is time consuming, expensive and difficult to scale) or have gone undone.
Mechanical Turk aims to make accessing human intelligence simple, scalable, and cost-effective. Businesses or developers needing tasks done (called Human Intelligence Tasks or “HITs”) can use the robust Mechanical Turk APIs to access thousands of high quality, low cost, global, on-demand workers—and then programmatically integrate the results of that work directly into their business processes and systems. Mechanical Turk enables developers and businesses to achieve their goals more quickly and at a lower cost than was previously possible.
Drill baby drill - on Mars
So, would the filtering of bad services from MTurk be performed using MTurk?
Same reason the USPS likes bulk mailers... they keep the operation afloat. Especially as more and more people turn to email.
"I will trust Google to 'do no evil' until the founders no longer run it." Hello Alphabet.
That data is from two months back, before Google Places appeared in web search. Now, it's worse. There's a whole mini-industry in the "black hat" search engine "optimization" community creating phony Google Places entries. Here's an ad on Mechanical Turk today:
Google Places spamming hasn't been fully automated yet, so we get to watch spammers outsource their manual spamming. Spamming Google Places is incredibly easy, much easier than creating the link farms required to spam Google's old web search. See the instructions in "Dominating Google Maps- The Most Effective Spam Ever And What You Can Learn From It".
Google Places has been 0wned.
I find it interesting that the people placing the HITs have to decide whether the work done is good quality and then decide to pay or not. So that means for each tiny job you farm out, you have to do your own tiny bit of make work to decide whether to pay or not. Can you farm this out on the turk too? If not, maybe there's a market for a service that let's you do so...
http://lkml.org/lkml/2005/8/20/95
From my time exploring mturk I would have guessed it to be much higher than that, non-spam related jobs were definitely the minority of what I saw.
The creepiest (and highest paying) job I saw though involved watching surveillance footage from airports, making sure the automated face tracker stayed on target...
So, obviously, Wikileaks should have hired people at 0.0001 cents per word to type in the leaked documents.
Check your premises.
The surprise is that anyone noticed all these HIT requests.
Who, other than the utterly unemployable, has time to take on meaningless tasks dished out by machine for pennies. You can find more money laying on the ground in a parking lot.
A casual perusal didn't find one task I would do for fun or profit.
Sig Battery depleted. Reverting to safe mode.
I'm also surprised at how low the wages are at this Turk thing. ... I thought spammers had to at least sweat through that manual task by themselves.
It's like $0.25 per human-generated spam. Automation seems to be coming. I'm seeing mentions on black hat SEO forums that an automated tool for doing this in bulk will be released early next month.
Marketing fake numerical addresses in between legit ones ensures that Google Pagerank rates your "unique" business as #1...
Sometimes. That technique is mostly used to give real businesses extra bogus locations. Check out "New York City locksmith", for example. Other heavily spammed terms are "carpet cleaning" and "divorce lawyer".
This week's new technique is described at "How To Spam Google Maps For Top Google Place Listings". This is like SQL injection for mailing addresses. The trick depends on Google's parsing of mailing addresses from the top, while USPS standards say they should be parsed from the bottom line upward. So a mailing address with two street addresses is parsed differently by the USPS and Google, allowing the spammer to redirect Google's confirmation postcard to some mail drop.
Google seems to be out to lunch in this area. The same exploits have been working for months. Yet Google doesn't list any such issues under "Known Issues. Over on Matt Cutts' blog, where you'd expect to see some discussion of this, he reports that he's writing a novel.
It's even worse at Bing. Bing emulated Google's October 27th merger of Places into web search within a few days. But they weren't ready. Look up "New York City locksmith" in Bing, and the five "Places" entries are all the same business.