Google's Manual For Its Unseen Human Raters
concealment writes "It's widely believed that Google search results are produced entirely by computer algorithms — in large part because Google would like this to be widely believed. But in fact a little-known group of home-worker humans plays a large part in the Google process. The way these raters go about their work has always been a mystery. Now, The Register has seen a copy of the guidelines Google issues to them."
"For relevance raters are advised to give a rating based on "Vital", "Useful", "Relevant", Slightly Relevant", "Off-Topic or Useless" or "Unratable"."
Hmmm, sounds like Slashdot. Anyone unemployed?
"It puts a good rating in the bin or else it gets the hose again"
What political party do you join when you don't like Bible-thumpers *or* hippies?
So I really can make $5000/month as a single mom?
First off, didn't read the article. Yeah, I said it. So if the article dispells this just ignore me.
What if Google actively uses the human ratings as a comparison/benchmark against which they measure those fancy algorithms? In other words, the users are rating the algorithms more than they are the websites. Makes sense they would improve search results algorithms, a highly technical and scientific method of ranking sites (which is of little use to a human in and of itself), by constantly striving toward an unscientific and untechnical (e.g. "quality") method... humans... which afterall is, you know, who uses the engine in the first place.
Amazon probably does the same to improve their suggestions model.
Truckin like the Doo-Dah man...
What are you looking for???
This is only believed by people who haven't thought about it very hard.
At an abstract level, it makes no sense to think that computer code can be optimized to perform a task without any human intervention. The reason is simple: the task we want the code to perform is always something that a human cares about. So, somehow we need a human to instruct the computer about the goals. This can take the form of a programmer meticulously coding the entire thing, with a particular human-relevant code in mind. Or it can involve non-programmers providing feedback about how well the software is doing at its stated goal (depending on context, these people may be testers, evaluators, users, taggers, etc.).
More specifically, in the case of AI-software, a typical procedure is to have a store of 'pre-tagged' training examples. These are example of problem, with associated 'correct' answers. The training data is used to optimize the AI algorithm: the software can tweak its behavior in order to maximize accuracy of output on the training examples, with the hope that this will then generalize to general use. For something like web-search, where the goal is to make a human end-user happy with the quality and relevance of the results, of course you need humans to assess the quality of the algorithmic results. This is the only way to keep the results relevant. (For search results, this is a continual and iterative process, since the web constantly changes, people are trying to game the system, etc.)
Thus, it's probably better to think of these raters as providing input for evaluating and refining the search algorithms; rather than thinking of them as people who get to uniquely decide the rank of pages. Obviously they will have an influence on the rank of the pages they rate, but overall they are evaluating a rather tiny fraction of the web-pages in the Google database. Thus, when you perform an arbitrary web-search, chances are the results you are seeing are purely algorithmic (none of the listed results were manually rated/adjusted by anyone).
I mean what is the percent of traffic on the net related to porn and what is it's percent represented on Google?
While it's often repeated that porn makes up the majority of traffic, in reality it's an almost insignificant amount. I'd bet that traffic only from Google exceeds porn traffic by several orders of magnitude.
No colour or religion ever stopped the bullet from a gun
Hang on...
Does this mean that the raters can view porn and claim that it's on the clock?
Apparently, if this is the case (which is probably is because Google's algorithms aren't AI), the tech sector needs a lot better rating.
For instance, do a search for a particular model of laptop. The results you get are of course mad online retail shops, but you also get a BUNCH of sites that have nothing to do with the product you searched. They put the names / models in META tags and in hidden or font-size-reduced areas of the page, but the actual page contents itself is just a bunch of crap that has nothing to do with laptops or laptop parts. It's just a bunch of random crap.
Point being, these aren't weeded out very well. Unfortunately, I don't have an example right now, but I know of one that has been in existence for years and still ranks in the top 5.
Oh, and the above is dwarfed by software name / functionality searches 10-1!
This was actually listed last year on several black/grey hat SEO websites to help dissect how google functioned. The upside is that with this wider exposure, google may change its policies a little.
I've actually been a Google rater. I spent about 2 years total doing it--long enough to become a 'moderator' who ensures quality feedback from other raters--in between, and supplemental to, "real" jobs. Raters give feedback on lots of Google services but it falls into two buckets: ranking the quality of legitimate results, and learning to spot the "spam".
Legit results are easy. Spam is more interesting. For one thing, I didn't entirely agree with their definition of what spam was--that's part of the reason you still see spammy results in some searches. The other part of course is that the spammers are constantly changing tactics. But it was actually kind of fun learning to spot the various methods spammers can use, and know that I was helping to improve search results by getting them off the front page (and hopefully off the top 100 pages).
But I always assumed that rater feedback was used to judge and adjust The Algorithm rather than individual page results. The Algorithm has always been king at Google.
"The way these raters go about their work has always been a mystery."
Not really. Anyone with half a brain could get to the second level of the work-from-home LeapForce exam, which is when they issue this guide. Nothing here is a secret or mystery.
If a page with a good manually set rating points to another page that other page should enjoy a good rep too. Perhaps for several "degrees of separation".
...Missionary, doggy, doggy, doggy, missionary, blowjob, blowjob, tossed salad, missionary, small cock...
If you want news from today, you have to come back tomorrow.
I constantly search for things, and a good half the time, *maybe* a third are relevant. Then there's the times where it completely ignores my conditions. For example, I've searched for a blazer with -ladies, because, duh, I only want men's, and I get hits that explicitly, in the title, say "ladies".
I won't even *mention* Target, who *always* claims to have whatever you're looking for in a sponsored ad on the side, and doesn't....
mark
I mean what is the percent of traffic on the net related to porn and what is it's percent represented on Google?
While it's often repeated that porn makes up the majority of traffic, in reality it's an almost insignificant amount. I'd bet that traffic only from Google exceeds porn traffic by several orders of magnitude.
From Scrubs:
Dr. Cox: Listen Vanessa Janice Tiffany Amber Thiessen. I'm gonna go ahead and give ya a little something I call Perry's Perspective.
J.D.: I should have that tattooed on my neck.
It must have been something you assimilated. . . .
While it's often repeated that porn makes up the majority of traffic, in reality it's an almost insignificant amount. I'd bet that traffic only from Google exceeds porn traffic by several orders of magnitude.
This, of course, depends on your definition of porn. To many of the people who oppose it, half of network TV content is porn...
"Convictions are more dangerous enemies of truth than lies."
You don't even mention human input to search rankings in your troll. Did you have it all typed up before this article was even posted, waiting for the first Google-related post so you could try to slip it in without appearing to be completely off-topic (despite the fact that your troll is, in fact, completely off-topic for this particular article)?
"Convictions are more dangerous enemies of truth than lies."
There was a period of a couple of years when a web page hosted on my ISP's freebie 15 megabytes of web space was the top hit for a particular Google search. It was a good page--a lay discussion of a technical topic--and I enjoyed the ego boost, but I always wondered why since I was not aware of it's being linked from anywhere, let alone any high-traffic or high-creditibility page. Now I think I know.
(I have since contributed that page's content to Wikipedia. The article has evolved with contributions from others but is still very recognizably mine... and I recently received a the left-handed compliment of an angry email from someone who'd stumbled across my own web page and complained that I had plagiarized it from Wikipedia!)
"How to Do Nothing," kids activities, back in print!
Parent post is correct. Pages are not evaluated by people, but rather by an algorithm. And search results are not produced by people, but rather by an algorithm. But the algorithms don't magically appear. Those algorithms are written by people. But even smart people with good intentions cannot know if an algorithm is going to produce a good result or not. And this is where the human raters come into the picture. Their job really is to evaluate the variations of the algorithms introduced by the developers, to ensure that improvements to the algorithm make it through and other changes do not.
Do you care about the security of your wireless mouse?
Hey Slashdot! Why am i moderating for free, eh? Shoooow meeeeeee the monay!!!
Trolling is a clear practice of trying to provoke response. If you look at my comment history, you see a series of contributions that stimulate discussion.
You calling them a troll suggests you're personally threatened by them. That's fairly typical internet behavior.
Addressing your passive-aggressive question of relevance, yes, this is relevant: Google is claiming it's an automated system, but it's using human beings to fudge the results, much like it prioritizes results like Wikipedia to ensure that an answer is always forthcoming.
That's not "don't be evil." That's outright deceptive.
I'm sorry that wasn't obvious to you but given that you called me a troll for it, I suspect PEBKAC on your end.
It seems to me that Bing has improved a lot, and I've watched DuckDuckGo improve quite a bit. Google may be dipping a bit as it tries too hard to localize and customize search results.