Randomizing Survey Answers For Accuracy
Saint Aardvark writes: "The New York Times reports that two researchers at IBM have come up with a way to persuade people to give correct answers to survey questions: randomize the results. Strangely enough, they can get accurate information out of the aggregate of enough answers -- but it's completely anonymized. Since conservative estimates say nearly half of all survey answers are bogus, there's an interest in persuading people to be more truthful. As ever, you can use the Random NY Times Registration Generator to falsify your registration details and read the article..."
Ok, fine. They've managed to come up with a model that doesn't actually collect any data. And how will this help people to enter REAL data? People don't give data because they don't trust the company. If they don't trust the company, do you really think they'll believe some mumbo-jumbo about "randomizing"?
Javascript + Nintendo DSi = DSiCade
Did you lie when answering this question?
O Yes
O No
O Cowboy Neal told me the answer
Do they expect that people will enter real data on the mere promise that it will be stored in some randomized, aggregate, or other form that does not invade their privacy? If the coroporation could not be trusted in the first place, no statement they make will make them trustworthy.
I think there is something to be said about companies that ask for information as an option versus companies that ask for information as a requirement.
For example, company XYZ has released a program called Widget. In order to download Widget, users are asked to fill out a survey so that XYZ may guage the demographics of their target audience.
Some sites will allow you to bypass this step and proceed to download the software. Other sites require this information before revealing the download link. I think that the psychological difference between "required" and "optional" would heavily influence the honesty of the answers.
I know that I never honestly fill out required forms. I'll fill in a bunch of bogus details, get the link, and be on my way. However, if the form is optional, I may download first and, if I like the program, provide some details to the company. The difference? I'm not being forced to give anything up in advance.
Is this true in general? I don't know. But it makes sense to me.
I have an idea for something to replace the survey forms - an AI program to carry out a conversation with the user. Ah ha! We just have to watch out for users that say to the AI - "I am lying" - and hope the AI doesn't need therapy.
Price, Quality, Time. Pick none. What, you thought you had a choice?
Personally, if I don't trust them enough to tell them how much I make, I'm not going to trust them to randomize my results. I don't see how this will increase accuracy -- especially if I keep telling everyone I'm a 108 year old female in Uganda making $100,000+ per year year who works in the sales department of an Educational field and plans to make purchases of an suv, a house, a console gaming system, a optical mouse in the next six months and rates thier internet experience as very low. My e-mail address is sjobs@mac.com and I would like to apply for your quarterly, monthly, weekly, daily, and hourly newsletters and I do give permission to pass this information to your affiliates.
Karma: SELECT `karma` FROM `users` WHERE `userid`=138474;
Let me summarize:
1) People lie on surveys, most likely because they don't trust the taker - but probably also just because they like putting in other answers (yeah, I'm a millionaire, woohoo!, etc). This only addresses the trust issue, ignoring other ptential sources of lying.
2) In order to work around the trust issue, they've developed a method of injecting random noise into the original answers as they are recorded and then extracting useful data in the end.
Notice their technology doesn't do anything to fix the underlying problem. The hope is that users will understand and trust the backend randomizer system, and that based on this trust they will answer more truthfully.
Without bothering with all this mumbo-jumbo, I can build a trustworthy system. I simply record survey statistics, and I promise not to use the individuals' personal data invidually.
They can either trust me that I'm telling the truth about this, or they can lie. In the IBM researchers' scenario, the users are again asked to trust that the backend system doesn't compromise them, and again they can choose to trust it or choose to lie.
Given the above, why on earth would you bother with this research and uneccesary complexity. It's not going to make any difference over just promising your users that you don't invade their privacy. You could replace their research results with a banner on top of the survey that says "After you sumbit your data to us, we use Magical HibiJibi technology to prevent ourselves from invading your privacy, so please trust us and answer truthfully"
What a waste of research.
11*43+456^2
Interesting approach, but useless unless people actually understand and trust the system. For this to happen will probably require widespread adoption, an easy to understand explanation of the process, and assurances that answers really are randomized. These requirements obviously force a bit of a chicken and the egg scenario.
Explaining the whole randomization process (how it protects privacy, how it provides useful info) will be a little much for most people I think, but a good user interface might alleviate this, perhaps with a 'randomize' button that is used before hitting the 'submit' button. This would take the user input and change it right in front of their eyes. Of course many would be rightfully concerned that the randomize button is just for show (or simply encodes but doesn't anonymize), but I think that enough people might buy into the false sense of security that demonstrated 'randomization' provides to at least partly improve the % of bonafide results. Also, the system could be set up so users who don't mind submitting traceable information could be encouraged ("extra 10% off") to submit without randomization, with a simple flag sorting data into randomized/anonymous and non-randomized/non-anonymous data).
This approach would be even better if the randomization approach becomes a ubiquitous standard backed by a consistent and legally accountable and well-known entity/brand (IBM for instance). I'm not sure how well an open solution would work unless there was a central group assuming responsibility and accountability for the system, enforcing trademarks, and suing spoofers. Also, people feel safer when they feel there's someone to blame for any abuse/mistakes (hence, giving their credit card freely to a waiter but not to a website).
My next sig will be ready soon, but friends can beat the rush!
As another poster observes, if you don't trust them with the data, why trust them to randomize it?
My college stats professor 10 years ago explained a simpler trick that puts control in the respondant's hands. It went something like this:
With each question, the respondant flips a coin and looks at the second hand of a clock. Only the respondant can see the coin or the clock.
If the second hand is between 1-30 seconds, they answer per the coin (e.g. heads=yes). If it's between 31-60, they tell the truth.
The surveyor, knows very precisely the number of 'lies', can extract accurate data, and the respondant has confidence and control over their privacy. All without a transistor.
Hey, it's me. The guy who put together and hosts the New York Times random login generator. First off, thanks for all your cards and letters - I originally just created that page to save myself some trouble, but I'm glad to see that everyone likes it so much.
I'd also like to remind anyone who wants to download, copy, and mirror the source of that page on their own servers, or even as an HTML page on your desktop or whatever. It's just javascript, so it's portable, and that way you'll still be able to use it when the NYT lawyers finally get around to noticing it or they start blocking requests from my page or something. (It will also help distribute my load, though I haven't had any real trouble yet...)
Timeo idiotikOS et dona ferentes