Concerns Over Microsoft's Internet User Profiling
jcatcw writes "Microsoft research on Internet user profiling could lead to tools that help repressive regimes identify anonymous dissidents, the Reporters Without Borders advocacy group warned last Friday. Microsoft's new algorithms correctly guessed the gender of a Web surfer 80% of the time, and his or her age 60% of the time. "In China, it is conceivable that this type of technology would be used to spot Internet users who regularly access such 'subversive' content as news and information websites critical of the regime," the group said."
I'd be uneasy about partnering with a bunch of totalitarian control freaks like Microsoft.
Unless you happen to be the only 20-25 year old male in China I think you're safe.
80% and 60% are both actually very poor accuracies. I wouldn't be worried; this won't be taken seriously as any type of reliable profiling.
Nothing is more dangerous than a programmer with a screwdriver.
... wouldn't it be easier to look up the IP address and persuade the ISP to hand over the user details?
The old ways are often the best.
Old COBOL programmers never die. They just code in C.
What percentage of net-users are male ? female ? If 80% are male, and the algorithm just guessed Male all the time, unless they bought 'ladies items' online, then they would have a pretty good accuracy. Age group usage would represent a bell graph, not difficult to again skew your results to favourably reflect on your algorithm.
Unless of course these results were made under strict scientific obervance and imparitiality.. nah !
"I am not bound to please thee with my answers" [William Shakespeare]
Link to paper. I don't claim to be knowledgeable about this stuff but that success rate doesn't look too remarkable to me. China's sex ratio is hardly so skewed (yet, anyway) that this could remotely identify someone from a pool of a billion users, or even out of a single Internet cafe.
I'd wonder more about the quality of research Microsoft is getting out of their Beijing site if they think this worth bragging about.
What I'm listening to now on Pandora...
If you read this post you are probably male. (80% correct)
You are probably in the 20-35 age group. (60% correct)
(I know I'll only get negative responses to this post of the type "I'm reading this and I'm a 47-year-old woman!" That's Ok. You're in the other bracket.)
My algorithm is as good as Microsoft's. Can I have a research grant please?
erroneous: look me up in a dictionary
Your gender is
Your age is
Reduce, reuse, cycle
Why don't we just enter out SSN every time we browse the web, and we can avoid all this 60-80% nonsense All sites visited would be logged in a centeral database, and would be used to deliver targed advertising. Just think! Based on my browsing habits, every website will look like torrentspy.com.
Copyright 2010. All rights reserved. This comment may not be copied in any way including, but not limited to caching.
Printable Version
Right now this doesn't worry me too much - after all, how much "identification of anonymous dissidents" could someone do based only on one's gender and a rough estimate of age? On the other hand, if Microsoft do expand to geographical location, occupation, and educational degree as mentioned, then it's rather worrying.
This is another example of why it's important to ensure that corporations aren't allowed to collect and store huge amounts of data about individuals. The fact that they can analyse it in some way or another is irrelevant if privacy is respected in the first place.
First things first: why China? (The same question applies to Venezuela, Russia, Brazil or whatever is the target of the Slashdot "fifteen minutes of hate" of the day). Of course people should be concerned about what these countries do wrt losses of privacy and basic rights, but what about U.S. and E.U.? As we talk, they are working on a new agreement to share data from passengers on trans-Atlantic flights, a much more effective way to profile people, because it contains name, address, gender, destination, credit card number, everything, without needing to make any kind of assumption, everything is plain and clear. This is why I think that not only "in China", as the summary states, but in most countries in the world, this information can and will be used to tag people indiscriminetaly, subversive or not, terrorist or not, law abiding or not. So, take care of your own backyard before to point the poison ivy in your neighbor one.
Second, it is not like if Microsoft was the only one researching and developing on this field and, more than that, it is not like if Microsoft was not researching on this field, any government interested on this kind of technology would not research itself, or fund research on its public universities. So, throwing Microsoft name on the mix only reinforces my point, this submission is nothing but a flamebait, being the flame targets the usual suspects, proprietary software and communism.
What? You think that the government of China doesn't have the resources or smarts to do this research and development themselves? Come on, shutting down research because the Chinese government might use it badly is very, very silly.
~ a low user id is no indication I have a clue what I'm talking about.
Any technology that reduces anonymization could have this effect. It's a tradeoff--ease of mobiliy for dissidents vs eliminating obnoxious assholes. So every time you have to suffer through a troll, you're protecting freedom-loving Chinese!
So now in addition to Tor et al and the things that help privacy (sending Google random data as search queries) all we will have to do is have something in the background opening up male/female sites over all popular age ranges. Way to have to cream everyone's bandwidth. Sheesh, is there anything you CAN get right, MicroJerk?
Profiling is akin to racism in my book. It's against democracy any way you look at it.
I realize the concept that this software, if it became more accurate, could be used by repressive regimes against their citizens. But as far a priorities go, I think they would do better to concentrate on bringing attention to human rights violations, and educatiting people about the rule-of-law.
No, see, it's okay when Google does it, because they said they would "do no evil." And we believe them! They're a big company whose name isn't Microsoft, so we automatically love them without questioning!
This reminds me of the scenes in Casablanca where the police are told to round up the usual suspects. Ultimately the accuracy doesn't matter to the government anyway. Worldwide, I think we are moving in a direction of less freedom rather than more, spearheaded by wrong-headed anti-terrorism hysteria in the US. So why should they care about accuracy, they'll just round up whoever fits the profile and sort it out later, or not.
To the making of books there is no end, so let's get started
"Our new algorithm you ask? Well, we took a look at the MSN search log and counted the ratio of keywords such as 'boobs' and 'anal'." Considering the various interests and beliefs of men and women, no algorithm could accurately guess a gender, even age.
"ATI cards are like buses...They're huge, red and have bad drivers."
What about somebody writing a browser extension that performs bogus searches in the background, for no better reason than to frustrate "profiling" attempts? Is this feasible?
Je fume. Tu fumes. Nous fûmes!
They detailed how much of the information they gather is from using MS related software such as Internet Explorer, etc.
Which is to say, how much info can be gathered using a non Ms browser such as Firefox with a Non MS operating system such as Apple or Linux, and avoiding non MS- dominated web sites?
The more important questions are a)what extent is this important to free societies and breaking the grip of totalitarian regimes on their societies?, and b) to what extent do we as memebers in free societies need to revolt against "corporate society" and their accelerating tendency to glom together data about private individuals??
...Open Source isn't the only answer -- but it's almost always a better value than the alternatives...
I don't get it. No matter what age-group/gender combination you think of, even combined with geodata and occupation/education levels, this doesn't even come close to identifying individuals. Unless you actually believe "a male farmer in his 30s in the Shanghai region" or "a female grandmother in the suburbs of Houston" is significantly detailed.
It makes more sense to worry about accidently getting linked to personal details left in instant messaging, e-mail, community profiles and/or conversations.
Well you you why...
ps: Humour
Right, so they're about 50% sure that it's someone who's both male and aged 24-30, living in China. It should be easy to pick out the individual from there.
Can anyone tell me how to set my sig on Slashdot?
wouldn't it be easier to look up the IP address and persuade the ISP to hand over the user details?
Requiring the ISP to keep records with "wiretapping" laws and then getting the details is the US method. Farming out the collation of records to a company like Choice Point goes beyond the laws and is both cheap and efficient.
In China, the regime is the ISP and they have the best equipment and methods that US companies could provide.
Friends don't help friends install M$ junk.
No, it's not OK when Google does things that might be used to harm people who have done nothing wrong.
This thread, however, is about the much nastier things that M$ does gleefully. I'd enjoy it if you compared the details.
Friends don't help friends install M$ junk.
... which is a completely different and only slightly correlated attribute.
politicians are like babies' nappies: they should both be changed regularly and for the same reasons
Microsoft's new algorithms correctly guessed the gender of a Web surfer 80% of the time, and his or her age 60% of the time.
A bit strange in my opinion. "Guessing the gender correctly" has already 50% even if you don't have any data about the user. So there is not much improvement here.
But the age... if you really guess the age thats more difficult. If we say we have everyone in the Internet up to the age of 100, you have a 1% change of guessing the age - much less than 50%.
So even if you improve only up to 60%; getting from 1% to 60% is much more than from 50% to 80%.
Perhaps they mean "guessing the age with a tolerance of 10 years"
Not as good as Micro$oft, though
...what sort of profile they'd get running that on /. posts?
Have gnu, will travel.
This doesn't sounds like a problem at all.
To use it the regime would have to know about other sites the user had visited to input that information in to the algo.
If they can already uniquely identify a users across multiple sites then they would already know who they were.
All this is useful for is processing information for marketing after using your phishing detecting(IE7, google toolbar) software to spy on your users.
...and that is all I have to say about that.
http://jessta.id.au
>So, throwing Microsoft name on the mix only reinforces my point, this submission is nothing but a flamebait
Nonsense. Microsoft's power and influence alone justifies concern about whatever they do. The same could be said of China. Your point is thus totally lost on me.
Are there a lot of people who, for example stay logged in to Google while checking email, searching the web, looking up maps etc? I log out every time. No personalized search for me thank you very much. Speaking of... lemme clear my cookies. hehehe
http://www.rsf.org/article.php3?id_article=22379
the other,
http://www.rsf.org/rubrique.php3?id_rubrique=20
is a splash page, me thinks
Unfortunately it is TOO effective. As a result, Americans spend way more money than they should. People stoped making informed purchases, they just buy the product of whoever has the most effective ad. This translates 100% into politics. The public no longer care about budget deficits, nor do they care about the candidate's stance on all issues. They are more likely to get behind a candidate with the best ads.
willingly give up all rights of privacy so they can be good citizen comrades in the willing partnership of the Corporate State and the citizen comrades!
All power to the Rights Fuhrer Bill Gates and Citizen Comrade Chiefs who free us from worry about nasty anonymity!
-- Tigger warning: This post may contain tiggers! --
Screw communism (no offense, read what I have to say). What do you think Democracies are going to do with this? I can imagine congressmen buying $60,000.00 copies of this product to determine exactly who's in their voting region, match up the IPs on the internet, and then use that to further harass us every time an election comes around! It's madness! At least in Communist states they put you out of your misery and do you in! Here is America they keep you alive so they can keep exploiting you... "It's the worst system of government - except for all others tried."
Consider yourself spoken to.
He who controls the past controls the future, and he who controls the present controls the past. In a world where all information can be changed at the flick of a switch the only thing you can ever be sure of is that which you witness with your own eyes. Everything outside of yourself can and frequently is rewritten to the whims of whatever regime happens to be in power at the time.
SRSLY.
Incredible. Thank you for telling me that it isn't OK when Google does things to harm people.
Can you show me where Microsoft does these things gleefully? Proof would consist of evil cackles recorded in MP3 files. Good luck.
(I'm no apologist, I only make fun of the groupthink around here.)
Don't bitch about it.