Randomizing Survey Answers For Accuracy
Saint Aardvark writes: "The New York Times reports that two researchers at IBM have come up with a way to persuade people to give correct answers to survey questions: randomize the results. Strangely enough, they can get accurate information out of the aggregate of enough answers -- but it's completely anonymized. Since conservative estimates say nearly half of all survey answers are bogus, there's an interest in persuading people to be more truthful. As ever, you can use the Random NY Times Registration Generator to falsify your registration details and read the article..."
Ok, fine. They've managed to come up with a model that doesn't actually collect any data. And how will this help people to enter REAL data? People don't give data because they don't trust the company. If they don't trust the company, do you really think they'll believe some mumbo-jumbo about "randomizing"?
Javascript + Nintendo DSi = DSiCade
Do they expect that people will enter real data on the mere promise that it will be stored in some randomized, aggregate, or other form that does not invade their privacy? If the coroporation could not be trusted in the first place, no statement they make will make them trustworthy.
Sounds all fine and dandy for science, but people are usually honest with a professional researcher who is going to gaurantee your ananymity, and moreover the research data is going to be used for something tangible rather than selling something right back to you.
Market researchers want information on YOU. They want generic info on your demographics, but this information has been available from other venues for a long time. When spy ware and other information gathering techniques are employed against someone they are being used to collect data to target marketing at that person specifically. Literally employed against that person.
As such, I'll still say that I'm female, in my 50's, from Yemen and making less than $12,000 a year. Randomize away.
What a pointless "technology".
Not at all, not at all. Like 80% of the stuff these days, it exists merely to get some nice paperwork for the students, after that it will be forgotten. Once they have their Masters/Doctorate in an incredibly narrow field, gotten themselves into debt, given money to textbook makers and given jobs to profs, they will have their paper that will get them a nice nice job, all the while perpetuating the myth of higher education and raising the bar for everyone else.
Hardly pointless, is it? I mean, it's the only way for a modern society to still use capitalism.
I think there is something to be said about companies that ask for information as an option versus companies that ask for information as a requirement.
For example, company XYZ has released a program called Widget. In order to download Widget, users are asked to fill out a survey so that XYZ may guage the demographics of their target audience.
Some sites will allow you to bypass this step and proceed to download the software. Other sites require this information before revealing the download link. I think that the psychological difference between "required" and "optional" would heavily influence the honesty of the answers.
I know that I never honestly fill out required forms. I'll fill in a bunch of bogus details, get the link, and be on my way. However, if the form is optional, I may download first and, if I like the program, provide some details to the company. The difference? I'm not being forced to give anything up in advance.
Is this true in general? I don't know. But it makes sense to me.
I have an idea for something to replace the survey forms - an AI program to carry out a conversation with the user. Ah ha! We just have to watch out for users that say to the AI - "I am lying" - and hope the AI doesn't need therapy.
Price, Quality, Time. Pick none. What, you thought you had a choice?
Let me summarize:
1) People lie on surveys, most likely because they don't trust the taker - but probably also just because they like putting in other answers (yeah, I'm a millionaire, woohoo!, etc). This only addresses the trust issue, ignoring other ptential sources of lying.
2) In order to work around the trust issue, they've developed a method of injecting random noise into the original answers as they are recorded and then extracting useful data in the end.
Notice their technology doesn't do anything to fix the underlying problem. The hope is that users will understand and trust the backend randomizer system, and that based on this trust they will answer more truthfully.
Without bothering with all this mumbo-jumbo, I can build a trustworthy system. I simply record survey statistics, and I promise not to use the individuals' personal data invidually.
They can either trust me that I'm telling the truth about this, or they can lie. In the IBM researchers' scenario, the users are again asked to trust that the backend system doesn't compromise them, and again they can choose to trust it or choose to lie.
Given the above, why on earth would you bother with this research and uneccesary complexity. It's not going to make any difference over just promising your users that you don't invade their privacy. You could replace their research results with a banner on top of the survey that says "After you sumbit your data to us, we use Magical HibiJibi technology to prevent ourselves from invading your privacy, so please trust us and answer truthfully"
What a waste of research.
11*43+456^2
Interesting approach, but useless unless people actually understand and trust the system. For this to happen will probably require widespread adoption, an easy to understand explanation of the process, and assurances that answers really are randomized. These requirements obviously force a bit of a chicken and the egg scenario.
Explaining the whole randomization process (how it protects privacy, how it provides useful info) will be a little much for most people I think, but a good user interface might alleviate this, perhaps with a 'randomize' button that is used before hitting the 'submit' button. This would take the user input and change it right in front of their eyes. Of course many would be rightfully concerned that the randomize button is just for show (or simply encodes but doesn't anonymize), but I think that enough people might buy into the false sense of security that demonstrated 'randomization' provides to at least partly improve the % of bonafide results. Also, the system could be set up so users who don't mind submitting traceable information could be encouraged ("extra 10% off") to submit without randomization, with a simple flag sorting data into randomized/anonymous and non-randomized/non-anonymous data).
This approach would be even better if the randomization approach becomes a ubiquitous standard backed by a consistent and legally accountable and well-known entity/brand (IBM for instance). I'm not sure how well an open solution would work unless there was a central group assuming responsibility and accountability for the system, enforcing trademarks, and suing spoofers. Also, people feel safer when they feel there's someone to blame for any abuse/mistakes (hence, giving their credit card freely to a waiter but not to a website).
My next sig will be ready soon, but friends can beat the rush!
However, since the reconstruction error would depend on the number of respondants, which will vary dramatically from site to site, I might also guess the 5% number was rectally extracted, and only used to make a point for the article that it will still be better than the error due to respondants lying, despite not being perfect.
All of this, or course, under the dubious assumption that people will stop lying just because random numbers have been added to their information, as numerous other posts here have discussed...
*...yea..yea..I know, there's no such thing as perfect random number generator, but those tests you hear about mathemiticians running on RNG algorithms are for the truly anal-retentive who are worried about patterns showing up after the 2^64th repetion or whatever. I doubt that even a relatively low-tech randum number algorithm would be taxed by this technique.
The problem with these techniques is that you can't force the user to do it manually (as they won't), and the user can't trust their own computer (running someone else's software) to do it for themselves. That latter objection is the one that has botched any number of theoretically sound online voting systems.
Useful in theory? Very. Useful in practice? Not so much.
The kind of questions that most of these sites ask include stuff that is impolite for friends to ask each other sometimes, never mind some random business. If they want accurate results, they should include the option for people to answer with a "MYOB" option. People are rather unlikely to keep tossing in crap data when they have the "MYOB" option, at least not in the 40% range. There is no way in hell that anyone making 100k+/year would actually admit it and give a business their real e-mail address. They would be begging for a flood of advertisements.
Why is it that online business feel they have the right to try and force so much personal information out of us? In brick 'n mortar stores, the worst info anyone asks me for is my zip code (or age to purchase alcohol). They can get my name if I use my credit card, but I can easily pay cash to avoid that.
It's very ironic that NYTimes would run this story.... Why do they expect me to tell them where I live, work, and what I make, just to read their articles? The paper version is nowhere near this invasive.
Yes, this is something that seems to have been overlooked by the other posters. They seem to look at the surveys as a way to steal their privacy, where as I simply see them as a waste of time. I would say this is true of a vast majority of the people I know, many of whom don't even understand that people go out of their way to track them.
If people didn't try to make their surveys mandatory, or obnoxious they might receive more truthful answers from that subset of the population that gives a crap enough to tell them they earn $30,000/y, work as a waitress, and has 2.5 children, and heard about this site/contest/whatever from spam e-mail sent by the publishers of the survey.
Until then, beware my "Bob Dole" fake survey answers.
Gallup has been randomizing the order of poll answers for many, many years now.
The next major step Gallup made was to randomly give slightly differing forms of a question to estimate the systematic bias due to the phrasing of a question. In my opinion, that is the real key to error estimation in polls. Since you have to phrase a question one way or another, you can never really remove this kind of systematic error, but at least you can estimate how large the error it is.
...which is not unusual on Slashdot - I do it all the time as well.
The idea of randomising answers it not new. It has been used in 'socially sensitive' surveys for years, if not decades.
Simple explanation:
Have a survey of 10 questions people don't like to reveal the truth of, ech with a yes/no answer.
For each question, either
a) reply truthfully
b) flip a coin and record whatever the coin gives.
If challenged about your answer, you can always say that's the answer the coin required.
Analyse the results for a large population of completed survays. Any significant deviation from 50% yes and 50% no answers tells you which way the population answered, without revealing who actually holds those views.
All you need is a coin to randomise your answers. This is independent of any web form, doctored answer sheet etc etc - so particular answers cannot be pinned on you.
It's fun administering the same survey to people with and without the randomisation - you get to see what people in general lie about!
Hope this gives a usefule summary of the method.
Regards,
pgrb
This line intentionally left..uh..blank?
>If you're one of those paranoid psychos, then don't give them your life story.
Too bad there's no "Skip this crap" option in their registration screen, huh?
So, the only way to not give them your life story is to lie. I know! Let's make it easy and create a random login generator so I don't have to type more random crap on every computer I use!
And, BTW, if you think I'm paranoid, I'll let you know that I was able to make any changes I wanted [but only did what I asked, of course] to my grandmother's phone line by simply asking her age and full name -- ALL of which are sent to NYT on that page. They only asked to hear a lady's voice, which my mother happily provided. Armed with just a birthdate and name I can make all sorts of changes to your services -- anonymously.
Knowing that, do you want to give me your name and address? If you don't, you should know there's no reason why I'm not working at the NYT right now... I will tell you that were I do work I have access to many, many, many records including Full Names and Birthdates. Feeling uneasy yet? Well, if you trust me, I've never abused those privleges.
>When they change their registration process and perhaps charging for their online content, don't start bitching.
My only bitching will be the fact their site goes offline for everyone. You can't compete in a (literally) Free market by charging infinitely more than your competitors. With the amount of newspapers online right now, and the amount of good content that doesn't come from the NYT, I think they'll end up another salon.
If you could be told what you can see or read, then it follows that you could be told what to say or think - BoC