The Demographics of Web Search
adaviel sends a link to work out of Yahoo Research indicating that demographics can help Web searches; e.g. a women searching for "wagner" probably wants the 18th-century German composer, while for men in the US "wagner" is a paint sprayer. The Yahoo researchers claim that by taking user demographics into account, "they managed to get the chosen link to appear as the top-ranked result 7 per cent more often than in the standard Yahoo search." New Scientist mentions this research and two other innovative adjuncts to current search practice: following the mouse cursor as a proxy for eye tracking, and taking back bearings on online criminals by studying the searches they make. (The latter raises disburbing privacy questions: would you want Google trolling through your search data? How about governments?)
Why is the story red and none of the others are?
would you want Google trolling through your search data? How about governments?
- what do you mean 'would you want', who is asking you, plebes?
You can't handle the truth.
When I google "rock climbing", "rock climbing in [my state]" in utah comes up in the suggestions already. seems google already considers stuff like this.
as for all the JS to tell my search engine company where my mouse is....thats a lot of ajaxy data back and forth for no reason. stupid and none of their business.
Wagner was a 19th-century composer, not 18th.
> would you want Google trolling through your search data? How about governments?
Heck yes I want Google trolling through governments' search data.
I just know this thread is going to turn out bad.
If I as a thirty something male am searching for Wagner, I'm probably searching for a German composer.
Applying demographic data like this is a non-sequitur.
(Yes, I'm being facetious, but still. That Wagner example is pretty awful.)
Yes, that's really what we need...
What next, a search result that depends on your religion? If you type "Origin of the Universe", you get articles about the Bible if the engine thinks you're Christian, and scientific material otherwise?
They need to understand there is little value in subjective data. Their results are already biased enough, they should take steps to fix that, not make it worse.
I don't want my neighbors to find out about my obsessive and crippling fear of genetically engineered dinosaurs next time they do a search for "Toronto Raptors" from my computer.
Procrastination Man strikes again!
Isn't Yahoo pretty much in the process of outsourcing their search to MS?
One that hath name thou can not otter
When I'm searching for pregnant-futanari-on-hermaphrodite-furry, I really mean pregnant-futanari-on-hermaphrodite-furry.
"e.g. a women searching for "wagner" probably wants the 18th-century German composer"
A -- women -- ???
I see a FLOOD of this, women used where woman should be used and woman where women should be used.
Wow......
A quick way to lose market share by annoying searchers who get misidentified.
The term "wagner" may indeed be looking for:
Richard Wagner - 19th Century composer
William Wagner - some sort of rounders player, I believe....
Wagner Spraytech - coating applicators
Wagner College - an educational institution
Wagner - a 1983 fillum
Robert Wagner - an actor
Lindsay Wagner - an actress
Connell Wagner - a civil engineering consultancy
anyhow, my point is that these came up on the first page of a Google search. If I were misdirected, I wouldn't be very impressed...
I don't disagree with the general principle, but I have to wonder if 7 percent is worth the time, effort, and privacy issues involved. Also, note that the 7% is of a specific 30% subset; the actual value for all queries is 1.5%. I then have to ask how many of those 'upgraded' top-ranked results were already near the top (i.e. in the top 10/first page of results). I feel that the whole idea is getting less fruitful by the second... - S
I'm not paranoid!
Came in here just to say that... Come on, people... just sound it out! I don't know anyone who pronounces those two words the same way.
They must have my demographic setting wrong. Half my searches for naked women come back with women's undergarment stores.
Joking aside, when you've got multiple people of different genders (such as in your average multifamily dwelling) using the same computer, such demographic results won't work too well. I wonder if this might explain, in part, why my search results really are less pertinent when I'm not signed into my gmail account.
~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
If you're searching for something where this would help - like home depot products and you fit the demographic you are in then great - add a button that keeps you in your area and helps you avoid german composers.
To me though, this would be very restricting if I'm truly trying to look up something I (and therefore maybe my demographic) knows just a little about. Steering me back to results that I already know about would get to be very annoying when what I am looking for isn't usually searched by my demographic.
... not!
When I was living in France for a while (job related), I was quite annoyed by all those websites that assumed that because my computer's IP was in France I wanted to see the site in French, even if the site was a .com and I explicitly tried to click the "English" link. (My French is good enough to buy some baguettes with rillettes, but not for reading technical articles.)
This goes into the same direction: It works in many cases but when it doesn't, it will piss off the user.
... this idea smacks of a tool that's trying to be *too* helpful, and ends up getting in the way. Kinda like the old microsoft paperclip. I went and turned off this function in google accounts when I realized that my search results were being shaped based on my history, since that partially defeats my expectations of how a search engine behaves, and degrades the utility, insofar as the utility (to me the user) is based on receiving an unbiased sampling of the matches. I'm also troubled by this trend in the way that google delivers their news offerings, it seems that the logical progression of this is that we will mostly only be exposed to material that fit our highly individualized pre-existing reality bubbles.
The first thing I thought of when I read Wagner was the popular brand of jeans.
There was/are gender predictors out there that will look through your search history and try to predict what gender you are. They were mildly successful (though dead wrong in my case). I think I prefer Google's more invasive yet more accurate method of paying attention to which results I click on and giving me more of the same without regard to gender or age. I DO like getting local results though.
As far as women vs woman goes ... tsk! just think, "would I use man or men here?", and then add a wo onto the front of it, its not that hard.
I'm not a bird, I'm a super-advanced flying stealth dinosaur!
Search history presents a great potential for loss of IP. I do technology development in an area of considerable interest/value. From looking at my search entries, it would be pretty easy to determine the directions of my development work and anticipate it. It's clear that search history mining is gonna happen. I'm interested in anonymizing my search activities as a result.
A search engine is supposed to find things which fit the regexp that you request.
Often someone will tell me in a forum to "search for x in google", what happens when the results are not exactly the same worldwide because of this technique?
Also, there are loads of people that use proxies and so on to search the web. (like people in china) Their demographics would appear all skewed because it would seem that someone in the proxy's country of origin is requesting to search for webpage x.
I don't agree with this technique at all. It just doesn't fit. Imagine if 'egrep' started filtering strings based on additional info that you could not easily control (like timezone), it would be annoying.
The search results are not just a regex matching. A modern search engine, like Google's, returns a ranked list of search results to you, and this ranking already has bias: the Pagerank algorithm sorts the results based on how popular the page is, as measured by the number of incoming links to that page. Of course, that is the general gyst of Pagerank as of the Google founders' research paper back in the late 1990s, and undoubtedly Google and other search engines have fine-tuned their algorithms since then to return "better" results to the user. But the point is still that there is already bias in the results.
Make no mistake that Google has not already thought of similar search result ranking algorithms similar to that posed in this Yahoo Research paper. The difference is that Google does not have a research arm like Yahoo, so they do not publish ideas like this. In hindsight, the Google founders were foolish to publish their Pagerank algorithm in the first place, but they were still at Stanford then.
Apparently search *wasn't* able to teach the author to spell. :)
following the mouse cursor as a proxy for eye tracking
And if the user turns out to never touch the mouse? Keylogging every single character pressed? This is plain absurd.
Only (a) Fräulein would be interested in Wagner.
We are borg. Resistance is futile. Make us a sammich and give us your wallet, man-slave.
#fuckbeta #iamslashdot #dicemustdie
Fewer people would think that "women", "children" etc. are singular if we wrote "womans", "childs". Blame the language.
I thought Wagner was a 20th Century actor, He was in the British TV drama "Colditz" and also had a minor role in The Longest Day. Probably best known for "Hart to Hart"
I expect something like this
For justice, we must go to Don Corleone
This would not be an issue if Google simply did not save that information. Sure, I know: they say they want all that information for "targeted advertising". BUT... surveys have shown that people do not want "targeted advertising" in the first place! Despite claims of the "benefits" to consumers, turns out they're not interested if it means losing privacy.
Wilhelm Richard Wagner was born in 1813 and died in 1883 which makes him a 19th Century German composer, not an 18th c. German composer.
Remember, here in 2010 it's the 21st century; in 1910 it was the 20th c.; in 1810 it was the 19th c., etc.
No, blame the laziness of Americans in general to learn proper English.
Mkay?
What if I'm a woman and I WANTED the paint brand, huh? (Or a more pertinent issue, if I start looking up for MMO's and it tries to steer me towards Bella Sera or something instead of WoW?)
Seriously, this has "Bad Idea" written all over it, for the criticism levied against it for entrenching gender stereotypes if nothing else.
That would probably be Georg Gottfried Wagner (1698-1756), who also played violin for Bach (1685-1750), another 18th-century composer, and not to be confused with Leonhard Emil Bach (1849-1902), a 19th-century composer.
Either that or KDawson thinks that "18 century" means "1800s."
(I am a musicologist, but I am not your musicologist, and this post is not intended as musicological advice).
No it probably would not.
If you look hard enough, you can find someone with any given last name who wrote some music in any given century.
Funny part is that he got "e.g." right, which is pretty rare...
When Google started to change from just linking the "Did you mean?" results to actually inserting them in place of the results for what I actually searched for, I realized on some level that this might be appropriate for people who don't know what they're doing and aren't paying attention, and that those people might be in the majority... But I didn't bother mentioning that in my angry feedback. =)
Maybe Google doesn't care about customer feedback because they're not in a position where they have to worry about the quality of the customer experience; if so, I hope they notice when that changes.
How are the search engines capable of doing this on their own? It needs to be remembered that almost 80% of internet users (in India at least), use dynamic IPs. Most ISPs here charge extra for static IP and most users just don't bother - what use would the average layman user have for a static IP? I'm assuming that's how it is in most other places too. Correlating searches and search patterns with demographic details needs active cooperation from all ISPs, isn't it?
And oh, thanks to the submitter for reminding me that Yahoo has a search engine too :-)
How many times do we search for one keyword (or even a string), spelled exactly so? Just like in a library catalogue. The last thing we want is some algorithm applying an undocumented filter to our search results.
It's bad enough that Google insist on fuzzyfying that string (even when you put it between quotes), but when it starts interpreting my search intent based on my demographic profile is when I will stop using it.
I want the same damn results anyone else gets from making the same searches. Why would I want it any different?
I'm not searching for something I already know, I'm searching for something someone else already knows.
We are all God's parents.
It's irritating enough to have to go specifically to a certain country domain for google to get the desired results (searching on the wrong one gives you junk for a query that clearly has no matches within that country, instead of out-of-country matches). And now I have to convince google I'm a girl to get to the work of the old nazi-loving composer?
Couldn't they base their results on which links I've clicked on in previous search results instead, if they have to personalize it?
Maybe someone should sue them for gender discrimination?
Yahoo Research says: e.g. a women searching for "wagner" probably wants the 18th-century German composer, while for men in the US "wagner" is a paint sprayer.
Google says: e.g. a women searching for "wagner" probably wants the 18th-century German composer, while for men in the US "wagner" is a porn star
Gee, I wonder which one men are gonna use...
It's OK, you don't need to give us an excuse.
To have a right to do a thing is not at all the same as to be right in doing it
THIS! I too have major hate of forced localization, everytime I set-up a new browser and load up Google, it goes to google.de (I'm in Germany, I speak the language well enough, but I want the content that I want, you stupid f'ing websites!). Even worse is Comedy Central and their South Park clips, an English-language blog embeds a clip from a South Park from Comedy Central, I click play, and guess what happens? The clip is dubbed in German! Aaarrrrggghhh!!!
Also trying to read myspace profiles (why, why?) gets pretty fucking irritating when it localizes the standard terms as "Favorite music", "Comments", etc, but then after the ":" displays the stuff the user's filled in, in their original language (usually English), meaning you have to read localized and then English words within the same sentence.
God damned morons all of them...
What time is it/will be over there? Check with my iPhone app!
Just wait until you notice the pervasiveness of 'then' being used in place of 'than'.
And the most hated 'the car needs washed' or 'that needs fixed'. Every time I hear that usage I feel the urge to start braining people with the complete works of Shakespeare while screeching "TO BE! TO BE!"
Completely off-topic but I feel OH so much better for venting.
No kidding. And using the word "seen" instead of "had seen" or "saw".... *sigh*
If I want to search for Wagner paint sprayers, I'll search for Wagner paint sprayers, not just Wagner.
Use the Wild Asian Ass example instead. A woman may be searching for the animal. A man may be searching for something else.
Not to stereotype anyone, some ladies may want to search out Wild Asian Ass, too.
Now, I need some time alone.
No, blame the laziness of people in general to learn proper English.
ftfy
I like a good yank bashing too, but I see just as much of this coming from the UK as from the US.
Is 1563649 a prime number?
e.g. a women searching for "wagner" probably wants the 18th-century German composer
As would anyone taking or interested in Philosophy. Of course this is an example of a terrible search query, like searching for "strange" without specifying "quark", but I'll take the generalized results all the same, thanks. Please don't tell me what I want.
https://www.eff.org/https-everywhere
And I first thought of Tina Wagner.