Math Indicates Pollster Is Forging Results

← Back to Stories (view on slashdot.org)

Math Indicates Pollster Is Forging Results

Posted by Soulskill on Friday September 25, 2009 @12:25PM from the lies-damned-lies-and-statistics dept.

An anonymous reader writes "Nate Silver suggests the political pollster Strategic Vision is 'cooking the books. And whoever is doing so is doing a pretty sloppy job.' Silver crunched five years worth of their polling data, and found their reported results followed a suspicious pattern which traditionally suggests fraud. The five-year distribution of the numbers 'is not random. It's not close to random.' The polling firm had already been reprimanded by the American Association for Public Opinion Research for failing to disclose their methodology, though the firm argues they did comply with the organization's request. Their response to Silver's accusation? 'We have a call in to our attorney on this and fully intend to take action that will vindicate us.'"

26 of 319 comments (clear)

Min score:

Reason:

Sort:

major fcukup at slashdot by postmortem · 2009-09-25 12:33 · Score: 5, Informative

a. you can't post
b. if you do manage to post, post goes to wrong topic!
1. Re:major fcukup at slashdot by multisync · 2009-09-25 12:53 · Score: 3, Informative
  
  Yeah, it's been like that off and on all day.
  To those with mod points: use them on something worthwhile. Noting that your posts are turning up in the wrong topic is on topic. Modding postmortem's post Off Topic is a mis-use of your mod points.
  
  --
  I don't care why you're posting AC
Re:Why should I care? by TeethWhitener · 2009-09-25 12:50 · Score: 5, Informative

In other words, do they do stuff that actually matters?
In a word, yes. Nate Silver manages the blog FiveThirtyEight and is well-known as a statistical analyst from the 2008 US election (among other things). Strategic Vision has released quite a few polls. In Silver's words,

...Strategic Vision's polls cover a wide array of topics: Presidential horse race numbers in any of a dozen or so states, senate and gubernatorial polling, primary polling, approval ratings of various kinds, polling on issues like the war in Iraq, and more abstract questions such as whether voters think that 'experience' or 'change' is the more important quality in a Presidential candidate.
So yes, this is pretty big news, should it turn out that Strategic Vision's behavior is in fact illicit. They're influential enough that news agencies may pick up their polling results. This is bad enough, but when you factor in the fact that polling results can be very effective propaganda in something like a presidential race, fraudulent polling can have significant consequences.
Re:Ah ha! by etymxris · 2009-09-25 13:05 · Score: 2, Informative

Not sure if you're trying to make a pun, but "categorical" in this case means "without exception." For example, Kant talks about categorical and hypothetical imperatives. Categorical imperatives you do always without exception (such as never lying, according to Kant anyway). Hypothetical imperatives are what you do based on the situation (CPR is appropriate only when someone is not breathing, for example).
Handwaving math. by Gorobei · 2009-09-25 13:11 · Score: 3, Informative

Nate Silver does great analysis at the first order multiple-linear-regression level -- he outperformed all the other polls/predictors in 2008 iirc.
He sucks at meta-analysis though, in that he just doesn't understand the math. His 2008 monte-carlo stuff gave good results, but was just a bad reinvention of averaging. His recent foray into analyzing stock returns was interesting but 0-information (i.e. useless.)
Now he's mentioning Benford's law, but playing with trailing digits. Then he handwaves a non-normal result with an appeal to "it looks wrong." Come on, give us some real math here!
That said, he's probably right, but he's given us no math to support his claim.
1. Re:Handwaving math. by Artifakt · 2009-09-25 16:43 · Score: 3, Informative
  
  Benford's law is sometimes called the First Digit law. It deals with cases where numbers are not equally probable, but rather lower integers are more common than higher ones. A good example of such a number is the first digit of street addresses. There are many short streets that only have a 100's block, and only a portion are long enough to also have a 200's block, fewer to have a 300's block, and so on, so the first digit is not equally likely to be, say, a 4 or a 7, rather there will be more fours than sevens. Some stock market numbers should fit Benford's law, and there are plenty of other cases with real world applications.
  However, the law in extended form does work for second or higher digits, or cases where the most likely value for a digit is not 1. Take the IRS for example. Last year, the standard deduction for married filing jointly was an even $10,000. Many people didn't bother to itemize schedule A unless it got them at least a couple of hundred extra back. So there were many people who claimed $10,2XX on their itemized returns, a few less that claimed in the $10,3XX and so on. $10,0XX or $10,1XX values probably weren't the most common, because a lot of people probably didn't bother to gather all the records needed and do all the paperwork if they though it was only going to get them, say, an extra $27 or even $104.
  The IRS could, and probably does use Benford's law to look for number patterns that may indicate fraud, but for some of those numbers, it's the second or latter digit that they should start at. (They won't publicly discuss whether they have any sorting/flagging software that is Benford's law based. I suspect they do as it would be foolish not to take advantage of the math here, but I have absolutely no proof other than that I use some of the same math in a private role, and it's been damned useful a couple of times in spotting a client trying to get me involved with something shady, so it should work equally well for the government.).
  So, using Benford's law for second or other trailing digits is legitimate. I can't tell from the article whether Nate Silver is doing everything else correctly, but the extension to a particular trailing digit isn't itself a flaw, and I could come up with a good psychological argument whey humans might fudge the second digit by a point or two, but only when it isn't already an 8 or 9, so as not to make the 10's digit roll, so focusing on digit 2 could certainly be justified. (as could focusing on the second digit to the right of a decimal point for precision results, by much the same logic).
  
  --
  Who is John Cabal?
Re:Too many 7s and 8s? by HornWumpus · 2009-09-25 13:19 · Score: 3, Informative

Take any data set and you'll find patterns that are statistically impossible.
Not if you understand statistics.
Also note: If you understand statistics you would _never_ use the phrase 'statistically impossible'

--
John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'
Use stats, not laws by unlametheweak · 2009-09-25 13:22 · Score: 2, Informative

Their response to Silver's accusation? 'We have a call in to our attorney on this and fully intend to take action that will vindicate us.'"
Generally, I would expect a logical course of action from an honest and transparent firm would be to hire a statistician to vindicate themselves. Lawyers don't make a reputable firm appear any less reputable.
Re:Too many 7s and 8s? by blueskies · 2009-09-25 13:23 · Score: 2, Informative

If i take any data set (say one with a standard distribution), how many of those data sets would i have to sample on average before i found one that looked like the ones he is talking about? If the expected number of data sets i would have to look at is in the millions, you are correct in that i might find it in my first sample, but the chances are incredibly tiny.
Re:Too many 7s and 8s? by evanbd · 2009-09-25 13:30 · Score: 4, Informative

Fortunately, there are corrections you can do for that. And he took a fairly normal statistical test on the numbers, which is equivalent to saying he didn't perform that many comparisons. To very rough approximation, you need to correct your p-value for all the less weird analyses you might have performed on the data instead. It's a bit hard to pin down an exact p-value for the analysis he did (the underlying data isn't expected to be flat; it's also not expected to be that bizarrely lumpy), but I promise that Nate Silver has an understanding of this issue (which you'd see, if you'd read the post).
Re:Why should I care? by bfields · 2009-09-25 13:38 · Score: 5, Informative

if they're the same "strategic vision" that the article is talking about, their webpage says "Strategic Vision has worldwide experience developing tools to measure decision-making, human behavior, attitudes and perceptions....
Nope, you're looking at the webpage of a different company! See Nate's previous article:

Why would you pick the name "Strategic Vision, LLC" for your company when the name "Strategic Vision, Inc." was already in use by an extremely well regarded, San Diego-based research firm that has been in business for more than 30 years? Are you deliberately trying to confuse your potential clients and leverage Strategic Vision, Inc.'s much stronger brand name?
Re:Why should I care? by maxume · 2009-09-25 13:42 · Score: 3, Informative

NBC always reports on the NBC/Wall Street Journal poll. I think they commission it. They seem to do a decent job of describing how they do it:
http://online.wsj.com/article/SB124527518023424769.html
(that link works when clicked on from a Google search, but given that the WSJ has a mighty paywall, I don't know if it will work otherwise)
So maybe you need to talk about a more nuanced group than 'the media' (I wouldn't be particularly shocked if other major outfits were at least approximately as responsible).

--
Nerd rage is the funniest rage.
Re:Why should I care? by quantaman · 2009-09-25 13:45 · Score: 4, Informative

Second, if they're the same "strategic vision" that the article is talking about
They're not, from another helpful article from FiveThirtyEight
Why would you pick the name "Strategic Vision, LLC" for your company when the name "Strategic Vision, Inc." was already in use by an extremely well regarded, San Diego-based research firm that has been in business for more than 30 years? Are you deliberately trying to confuse your potential clients and leverage Strategic Vision, Inc.'s much stronger brand name?
You're looking at the page from the well regarded Strategic Vision, Inc. Funny that SV LLC seems to be so happy to sue Nate Silver, it would seem that SV Inc has a far stronger case against SV LLC.
Could be an interesting intersection of Trademark/Slander laws...

--
I stole this Sig
Re:improbable by Silentknyght · 2009-09-25 13:55 · Score: 2, Informative

However, I'm unconvinced that this is some sort of smoking gun; Silver needs to really run this sort of simplistic analysis on a lot of other polls and see if there in fact is a bias towards a 47 - 43 split with 10% undecided. That actually sounds about right for a lot of the polls I remember in the last election.
If you read the TFA, Nate addresses this. He states that his data--SV LLC's polling results--are selected from a wide, wide, wide variety of topics, not just necessarily the highly divisive ones where there may be a relatively even split between two choices.
Moreover, (as Nate states) over enough data, even the effect of the undecided percentage on the trailing digit should be random.
Re:Why should I care? by (startx) · 2009-09-25 13:57 · Score: 4, Informative

Except you've linked to the wrong company. Strategic Vision, Inc. is a well respected 30-year old polling firm in California. Strategic Vision, LLC is the shady 5-year old GOP shill corp with questionable poll results and no real office (or polling results allegedly). Careful with those links, you don't want to slander the wrong company here. I think SV Inc. may have a trademark case on their hands if their feeling litigeous.
Re:Why should I care? by Attack+DAWWG · 2009-09-25 14:09 · Score: 5, Informative

They are a partisan, Republican-oriented polling company. They have gotten into trouble in the recent past for their questionable results.
Re:What's wrong with this data? by Johnny+Loves+Linux · 2009-09-25 14:19 · Score: 2, Informative
> since when are gaussians not random?
That's exactly the problem he's pointing out. The second digit should be a UNIFORM distribution if it came from real data. If the digits are gaussian that indicates that either
- there's some process accounting for a gaussian distribution that he doesn't know about (and he does consider that possibility) or
- the numbers are cooked by a human being who has a preference for 8's over other digits.
Re:Not statistically significant by ceoyoyo · 2009-09-25 14:24 · Score: 5, Informative

First, the example he gives where he looks at polls from ALL sources is an example of a plausible distribution of real results because, assuming the majority of pollsters are not cooking their data, the data should be dominated by randomness. He then looks at this particular pollster and finds a much greater disparity in trailing digit frequency. The question is, is it significant, or just chance?
Given the numbers, it's not particularly hard to figure out. You can calculate the likelihood of any particular result given a theoretical distribution using a G test of goodness of fit. Technically for numbers this small you could use an exact test but I don't know of a web version and I'm too lazy to write one up. But here's a description of, and an excel spreadsheet that performs, the G test of goodness of fit: http://udel.edu/~mcdonald/statgtestgof.html
Basically, you plug in the distribution you see and compare it with the one you expected. What you get is the probability of that distribution occurring by chance. So if we plug in the observed data for all the pollsters and assume equal likelihood for all trailing digits we get a p=0.006. Whoops, looks like our assumption isn't quite correct. As the blog author notes, the observed distribution is humped a little, favouring the middle numbers. He also gives a possible explanation. For giggles, the probability of the Strategic Vision results given equally probable trailing digits is absolutely microscopic: p=1.44x10^-17. Together those tell us that our assumption of equal digit distribution is probably not quite right, but the Strategic Vision data still looks mighty funny.
Okay, so assume instead that most pollsters aren't making up their numbers. Not that their numbers are necessarily accurate, but that they're at least not making them up off the top of their heads. So using the data from all pollsters as a template, how likely is the Strategic Vision distribution? That's a G test of independence: http://udel.edu/~mcdonald/statgtestind.html. We could use Fisher's exact test, but I can't find one that will do a 2x10 table.
Plugging in the data, we get G=43.068, d.f.=9, which gives p=2.09x10^-6. The blog author was actually a little careless when he said the chances of Strategic Vision's results are millions to one against. If you insist on the equal-probability theory then the odds are 70 quadrillion to one against Strategic Vision and 166 to one against the industry as a whole. Taking the more realistic approach that the industry average is a better representation of the actual probability, the odds against Strategic Vision's results are about half a million to one against. Not millions to one, but close enough.
Re:Why should I care? by interkin3tic · 2009-09-25 15:05 · Score: 4, Informative

I hereby take back everything I said about Strategic Vision and reapply it to Strategic Vision, LLC, times two.
Re:Too many 7s and 8s? by bidule · 2009-09-25 15:20 · Score: 2, Informative

Also note: If you understand statistics you would _never_ use the phrase 'statistically impossible'
If you understood thermodynamics, you'd know that 'statistically impossible' is why the world doesn't go crazy. Like sudden appearance of vacuum when you try to breathe or random melting of spoon when stirring your coffee.

--
ID: the nose did not occur naturally, how would we wear glasses otherwise? (apologies to Voltaire)
Re:Why should I care? by Discordantus · 2009-09-25 15:48 · Score: 2, Informative

It shouldn't (but probably will) be considered trolling to point out that the political section of their client list consists of the Republican Party, the Conservative Party (of England), The Department of Defense, the Whitehouse, and the State of California. That section hasn't changed in that last year, so I assume it's referring to not only the Republican governor of California, but also Dubbya's Whitehouse. Sounds like they get most, if not all, of their political business from conservative sources.
Re:What's wrong with this data? by Henry+V+.009 · 2009-09-25 15:56 · Score: 1, Informative

Really, why not try proving that a particular digit should be uniformly distributed? I'll give you a minute.

Not done yet? I'll give you a hint: Benford's Law shows why it doesn't have to have a uniform distribution. The original critique is likely fallacious.
Re:Why should I care? by plague911 · 2009-09-25 16:18 · Score: 4, Informative

"Yeah, you go ahead and cling to the belief that the insurance industry doesn't want the health care bill to go through"
You are right the insurance industry would stand to gain massively by that proposal. That's exactly why the liberal sect of the democratic party has been fighting that provision.
I would like to point out that the insurance industry is being very pragmatic they have a two tier battle plan. They don't want the bill to pass however if it dose pass they want to have things like that put in
That provision was added to some of the bills to "tempt" republicans into voting for it as several Republicans have explicitly said they would like to see that included.
As far as "I'd certainly like to see some numbers regarding who the insurance industry as a whole is contributing to." The money has been flowing quite rapidly into the conservative arm of the democratic party. Ben Nelson, Mary Landrieu and Max Baucus have all goten heavy donations since this whole thing has started (from insurance companies). That is not to say that the republicans have not been getting a lot of money from the insurance companies. (That goes without saying) So to some it up Republicans are continuing to get good pay checks,(the usual) however some conservative democrats are now also getting paid for their services(Newish). Just for your info many progressives want political blood for this, Ben Nelson and Max Baucus and to a much lesser extent Mary Landrieu are the one thing that is standing in the way of progressives' holy grail. For that many of us want political revenge at any cost.
Re:Why should I care? by petermgreen · 2009-09-25 21:44 · Score: 4, Informative

Well, you might need to explain what astroturfing is
Astroturfing is where a special interest tries to create the impression of grassroots support. That may be through paying shills to post a lot on message boards with posts that support your position, it may be through dodgy polls or it may be through other means.

--
note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
Re:Why should I care? by Anonymous Coward · 2009-09-26 03:07 · Score: 1, Informative

Learn the rules of grammar. Stop making stupid mistakes with spelling.
If you do that, the curmudgeons will be out of jobs.
In the real world... by DragonWriter · 2009-09-26 04:35 · Score: 2, Informative

If the vote is to reflect public opinion, people should vote their own opinion. They don't need to try to help the system by guessing the most popular option.
Sure, in an unattainably perfect world with perfect election systems, this would be true. However, one most note that its impossible to have a single-winner voting system where more than two candidates stand for election where strategic voting is not rewarded if voting actually matters at all.
In the real world, strategic voting which takes into account the preferences and likely behavior of other voters, assuming it is based on accurate information, produces better results than blindly voting your own true preferences.
Even ignoring the incentives for strategic voting, though, there is a cost benefit analysis in pre-voting activities which effect the success of candidates and ballot propositions -- even if a person believes something is a good idea and plans to vote for it, they are far less likely to expend resources (whether by donations of money or of time and effort) if they feel that those resources are unlikely to make a difference in the outcome.
So, ultimately, there are good reasons why people's understanding of the popularity of a political idea or candidate affects their behavior regarding that idea or candidate.