Math Indicates Pollster Is Forging Results
An anonymous reader writes "Nate Silver suggests the political pollster Strategic Vision is 'cooking the books. And whoever is doing so is doing a pretty sloppy job.' Silver crunched five years worth of their polling data, and found their reported results followed a suspicious pattern which traditionally suggests fraud. The five-year distribution of the numbers 'is not random. It's not close to random.' The polling firm had already been reprimanded by the American Association for Public Opinion Research for failing to disclose their methodology, though the firm argues they did comply with the organization's request. Their response to Silver's accusation? 'We have a call in to our attorney on this and fully intend to take action that will vindicate us.'"
I like that in your enthusiasm for all things private launch industry, you didn't notice that you'd loaded the wrong story to comment on... :)
a. you can't post
b. if you do manage to post, post goes to wrong topic!
Pretend I know nothing about Pollster (which happens to be true). Why should I care whether they've faked results? By that, I mean: do they research options of favorite flavors of cotton candy, or public support for health care reform, or the best style of car, or...? In other words, do they do stuff that actually matters?
Dewey, what part of this looks like authorities should be involved?
what exactly do these guys have to do with NASA?
A bullet may have your name on it but splash damage is addressed "To whom it may concern."
The nasa story is not working. I guess at this time, NASA stands for "Need Another Slashdot Area".
I prefer the "u" in honour as it seems to be missing these days.
I call total, 100%, biased, fuck me up the ass horseshit on this inane accusation. Lies, damn lies, and statistics.
So, which category do they deny? The category of truth or the category of lying?
It's NOT me! It's the meds! I'm on 1000mg of Fukitol.
Polls show that 78.6% of all statistics are made up on the spot.
"National Security is the chief cause of national insecurity." - Celine's First Law
Anyway, back to the topic of Windows 98 being released today. I wonder if the Clinton Administration will continue with the anti trust investigations into M$.
It's NOT me! It's the meds! I'm on 1000mg of Fukitol.
Agreed. Who is this Math guy anyway? Perhaps it's Math who faked the results, and Pollster is beyond reproach!
BREAKING NEWS:
The AP is reporting a major fuckup at Slashdot. The web site cannot even do the most basic task essential to its operation, allows readers to leave comments on articles. No comments were available from anyone employed by the web site. Phones rang and rang and rang. Several other Sourceforge properties had their numbers disconnected due to non-payment.
It is apparent no one in charge of the place gives even a sliver of a fuck, or even reads the front page after articles are posted, as it is 2009 and there are 50 fucking ways to notify the readership of the nature of the problem and the expected timeline for resolution. And that 50 is just from a fucking cell phone. If a person had an actual computer and an internet connection, even a netbook at a Starbucks, the number rises into the 1000s.
Long gone are the days when the popular geek web site devoted to technology actually worked. Long gone are the days when there were actual technical explanations of outages. Instead its more stories about politicians arguing over traffic ticket revenue posted as "Your Rights Online", iPhone slashvertisements, slashvertisements masquerading as book reviews, and links to people's blogs about blogs about news stories, and/or tweets about tweets about press conference summaries.
I'm not sure I understand what Silver is claiming about the data.
He shows that the distribution of second digits in the results of Pollster's polls doesn't follow a uniform distribution -- and from that he somehow deduces it's not random.
If you look at the figure in the second article, it looks to my untrained eyes like a gaussian curve with maximum around 8 -- since when are gaussians not random?
Reading TFA, Nate's analysis implies that there is a systematic bias toward some last digits in the overall poll percentages aggregated over many disparate topics.
What seems so improbable (to me) is that if someone really were grossly "cooking the books" like this - literally not doing the poll, or tallying any numbers at all, but instead simply reporting fake results for press ... is that they would be so stupid to make up the results manually instead of using a computer in some way. What, some guy in an office reading other polls and saying "gee I think the number will be 45%."
If this kind of bias really has been introduced by manually creating and publishing the results (as the analysis seems to imply), then it will be easy to track down and prove with further digging into the data, interviewing people who made the calls or took the data, etc. However, accepting such an explanation would requires a level of stupid on the part of the principals in this company that is so extreme that I find such a scenario an improbable explanation for the results presented.
Nate Silver does great analysis at the first order multiple-linear-regression level -- he outperformed all the other polls/predictors in 2008 iirc.
He sucks at meta-analysis though, in that he just doesn't understand the math. His 2008 monte-carlo stuff gave good results, but was just a bad reinvention of averaging. His recent foray into analyzing stock returns was interesting but 0-information (i.e. useless.)
Now he's mentioning Benford's law, but playing with trailing digits. Then he handwaves a non-normal result with an appeal to "it looks wrong." Come on, give us some real math here!
That said, he's probably right, but he's given us no math to support his claim.
... and pollster's statistics
Most M2s aren't going to read the entire discussion at -1 before M2ing. I know I don't. Sorry.
By the way, the unambiguous word you're looking for is "broken". And please don't say "ppl". This is Slashdot.
# cat
Damn, my RAM is full of llamas.
Take any data set and you'll find patterns that are statistically impossible.
Not if you understand statistics.
Also note: If you understand statistics you would _never_ use the phrase 'statistically impossible'
John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'
Their response to Silver's accusation? 'We have a call in to our attorney on this and fully intend to take action that will vindicate us.'"
Generally, I would expect a logical course of action from an honest and transparent firm would be to hire a statistician to vindicate themselves. Lawyers don't make a reputable firm appear any less reputable.
If i take any data set (say one with a standard distribution), how many of those data sets would i have to sample on average before i found one that looked like the ones he is talking about? If the expected number of data sets i would have to look at is in the millions, you are correct in that i might find it in my first sample, but the chances are incredibly tiny.
You meant to post this in the thread for the new game with time travel right? :-]
Inane Comments are Generously Disregarded
Fortunately, there are corrections you can do for that. And he took a fairly normal statistical test on the numbers, which is equivalent to saying he didn't perform that many comparisons. To very rough approximation, you need to correct your p-value for all the less weird analyses you might have performed on the data instead. It's a bit hard to pin down an exact p-value for the analysis he did (the underlying data isn't expected to be flat; it's also not expected to be that bizarrely lumpy), but I promise that Nate Silver has an understanding of this issue (which you'd see, if you'd read the post).
Here is some more info about them. According to the article they are a "Republican-oriented polling firm based in Atlanta."
I think it would have been nice for the poster to indicate that this refers to Strategic Vision, LLC., and *not* Strategic Vision, Inc., since Nate Silver specifically suggests that they may be trying to play off the credibility of the latter.
It's "Who cares?"(s) all the way down.
There's not enough eigenvectors in this thread...
http://controls.engin.umich.edu/wiki/index.php/Occasionally_dishonest_casino:_crimes_or_just_noise%3F
When you are making decisions based on public opinion and the differance between 52% and 48% makes the difference between whether you keep your elected position. Imagine what the difference would be between 60% and 40%. I'm not sure of the exact reasoning for these kinds of polls 20% seems to be about the stranded margin of error. I imagine it has some aspect of what the state of the art is in scientific statistic estimation theory is. In which case 57% to 20% difference would like using 1850's technology compared to the technology we will have in 2010. At the very least Strategic Vision is run by idiots. If not they are intentionally misleading the public.
Seems like NASA story comments are appearing in here. Tragically, GP might have been modded off topic and now mocked through no fault of his own. There is no justice in this world *shakes fist at the gods*
Negative moral value of force outweighs the positive value of good intentions.
By all means, dont back up your numbers with mathmatical proof of your own.
I cant wait to see how they try to sue mathmatical laws and formulae.
Protip: Winning a lawsuite doesnt make you any less a liar.
I've done more statistics than you give me credit for. And "statistically impossible" is a manner of speech. Anyway, here's what I'm talking about:
What is the probability that the first result will be a1, the second a2, the third a3, and so on for some given set of constants a1...an? Well, the probability that the results measured will be those exact values is pretty much impossible, statistically speaking. Well, it is if you're asking the question before you actually do your experiments. But if you have your experimental results at hand, you can ask that question, having the benefit of having the results right in front of you, and make the results seem implausible. Of course, that example is very simplistic. In any set of data, there is some pattern that, a priori, is nearly impossible to obtain. But there are an uncountable number of such patterns. So for every data set, you can find some pattern that shouldn't happen.
FYI:
Good: Strategic Visions Inc. @ http://www.strategicvisionsinc.com/
Suspect: Strategic Visions, LLC @ http://www.strategicvision.biz/
See: http://www.fivethirtyeight.com/2009/09/few-more-questions-for-sketchy-pollster.html
Comment removed based on user account deletion
Wow, I even screwed it up... they're both "Strategic Vision" without the s at the end, and the Good is at http://www.strategicvision.com/
First, the example he gives where he looks at polls from ALL sources is an example of a plausible distribution of real results because, assuming the majority of pollsters are not cooking their data, the data should be dominated by randomness. He then looks at this particular pollster and finds a much greater disparity in trailing digit frequency. The question is, is it significant, or just chance?
Given the numbers, it's not particularly hard to figure out. You can calculate the likelihood of any particular result given a theoretical distribution using a G test of goodness of fit. Technically for numbers this small you could use an exact test but I don't know of a web version and I'm too lazy to write one up. But here's a description of, and an excel spreadsheet that performs, the G test of goodness of fit: http://udel.edu/~mcdonald/statgtestgof.html
Basically, you plug in the distribution you see and compare it with the one you expected. What you get is the probability of that distribution occurring by chance. So if we plug in the observed data for all the pollsters and assume equal likelihood for all trailing digits we get a p=0.006. Whoops, looks like our assumption isn't quite correct. As the blog author notes, the observed distribution is humped a little, favouring the middle numbers. He also gives a possible explanation. For giggles, the probability of the Strategic Vision results given equally probable trailing digits is absolutely microscopic: p=1.44x10^-17. Together those tell us that our assumption of equal digit distribution is probably not quite right, but the Strategic Vision data still looks mighty funny.
Okay, so assume instead that most pollsters aren't making up their numbers. Not that their numbers are necessarily accurate, but that they're at least not making them up off the top of their heads. So using the data from all pollsters as a template, how likely is the Strategic Vision distribution? That's a G test of independence: http://udel.edu/~mcdonald/statgtestind.html. We could use Fisher's exact test, but I can't find one that will do a 2x10 table.
Plugging in the data, we get G=43.068, d.f.=9, which gives p=2.09x10^-6. The blog author was actually a little careless when he said the chances of Strategic Vision's results are millions to one against. If you insist on the equal-probability theory then the odds are 70 quadrillion to one against Strategic Vision and 166 to one against the industry as a whole. Taking the more realistic approach that the industry average is a better representation of the actual probability, the odds against Strategic Vision's results are about half a million to one against. Not millions to one, but close enough.
If you take one with a uniform distribution then you would expect to find one with a greater or equal disparity to the one observed once every 70 quadrillion. If you take a distribution corresponding to the industry average, you'd get a result disparity greater than or equal to strategic vision's, on average, one time in about half a million.
I worked it out above. ;)
If you accept his initial theory that the digits should be equally probable then it's a multinomial exact test or a G test of goodness of fit. If you observe, as he did, that the industry average supports a slightly different distribution then you can compare SV's results with the industry average using a Fisher's exact or G test of goodness of fit. They're simple tests, and no corrections are necessary unless you do multiple comparisons, which is not the case here.
I know this might be slightly off-topic, but I think that the issues Slashdot has been having are due to an unexpected spike in traffic after they posted the story of how 3D Realms was switching over to Epic's Unreal Engine for the upcoming Duke Nukem Forever. I'm pretty stoked about this and am saving up to be able to afford a Voodoo2 - DNF is gonna be da bomb!!
Now that I think about it, I'm pretty sure everything I just said is completely wrong.
It's not uncommon for polls to be conducted in a less than objective fashion. For example, a pollster might play a series of carefully selected audio clips from a political debate, which are designed to make one candidate look better than the other, and then ask the subject their opinion of the candidates. The goal is to "push" an opinion on the subject rather than collect information. If a push poll is successful, the data is going to be skewed. And after applying Benford's Law you're probably going to see a lot of 7s.
Strategic Vision LLC states in the first line of the first para on their political website: "At Strategic Vision Political, we craft winning results for our clients." Taken figuratively, that means that they "cook the books" and they are pretty up front about it. This /. is just another "Let's Ban Photoshop from Advertising " sensations.
I think therefore I can't be ~TTNH
I do understand statistics. While I agree it is highly unlikely that I would use the phrase 'statistically impossible', there is some nonzero probability.
I've been following Nate ever since the 2008 elections, and I've much enjoyed his analysis. Being a mathematician, I can spot BS math, but Nate usually does a decent job with no BS. But this article is has so many analytical gaps that I feel awkward supporting him this time, even though the article as a whole is convincing. To make such a bold claim as he is, I would've expected him to assess this more completely. He did no comparisons to other pollsters, and sampled data that is not IID (identically and independently distributed). i.e. if a boolean poll has 49% for one side (9) the other answer has to be 51% (1) The last digits (1 and 9) are completely dependent. Not all polls are boolean, but there will still be correlations, and many polls in the sample are boolean. Not only that, but he mis-applied the reference to Benford's Law. I know he knows what Benford's law is, because he's had multiple other posts about it, but got it dead wrong in this article.
I'm glad there is someone sufficiently mathematical to look for things like this and have a wide enough audience to be heard, but I wish he'd taken some more time to do look at more control groups and do some confidence intervals before sticking his head into a potential legal mess.
First off the idea that a website or group of people are distorting polls doesn't all that much surprise me. The fact that they are,"talking to their lawyer" seems to exume guilt personally. I myself would not find it mind blowing if in fact polls were not the truth in many cases. To actually think that all polls are true at the core would seem to show a certain amount of ignorance. In some cases it would seem troubling to think that polls large enough be used by mass media are being forged. The question I would want answered would be why exactly, for who, or what purpose had the polls been messed with? Is there a certain group behind it, is there a pattern for what the polls suggest or agree with. In the end if this comes out to be true, I wouldn't be surprised, at this time it would take a lot to surprise most people about mass media I believe.
Also note: If you understand statistics you would _never_ use the phrase 'statistically impossible'
If you understood thermodynamics, you'd know that 'statistically impossible' is why the world doesn't go crazy. Like sudden appearance of vacuum when you try to breathe or random melting of spoon when stirring your coffee.
ID: the nose did not occur naturally, how would we wear glasses otherwise? (apologies to Voltaire)
I have been programming accounting software for almost fifteen years and the first nasty lesson I learned was that data can be presented in unlimited ways and if you want to get paid you better make it look good. Change the scale, oversample, skew the questions and all sorts of other nasty tricks are now par for the course.
We now have well respected polls contradicting each other by double digits because of the politicizing of any information that might change voters opinions. I never thought that I would long for the post civil war years of reapproachment and unity.
yeah this is gonna kill my existing mods...
I'm from the future.
Hate to break this to you but DNF has new ownership. And to hel with the temporal prime directive. This part of space is all fscked up anyway.
whois gawk date unzip strip find touch finger mount join nice man top fsck grep eject more yes exit umount sleep dump
Also note: If you understand statistics you would _never_ use the phrase 'statistically impossible'
So you would say "unpossible" then?
I can't believe I even penetrated that wall of text far enough to where you start spouting nonsense
Huh?
they believe the two parties cooperate to keep smaller parties from gaining traction
maybe you're just lucky you haven't been dealing with the kind of kooks i've heard from
intellectual property law is philosophically incoherent. it is your moral duty to ignore it or sabotage it
Statistically Impossible may well have meaning. In Cosmology, various people at various times (Hawking, Guth, Dirac, and Einstein (1n the late 40's working with Minkowski and Godel), all found that they had to write a few pages on whether very improbable events were distinguishable from zero probability events before they could justify using some of their math. All were working on their own takes on the origin of the Cosmos problem at the time. Most of them decided that any event with a probability of less than 1 in the whole lifetime of the Cosmos was 'statistically impossible' and not just 'improbable'. Rosen later argued that it was better to phrase it in terms of less than 1 during that part of the cosmos's lifetime when entropy was low enough to allow other events of that same energetic magnitude to happen normally rather than the whole lifetime, and others have debated the point various ways, but it's still common to call some things statistically impossible when doing fundamental cosmology.
Oh, and I need a new spoon.
Who is John Cabal?
I've just checked, and it's pretty easy to generate last digit distributions that look a great deal like the one shown for strategic vision. If you assume they poll over contentious issues (which are divided close to 50/50 in the population opinion) and that there are a small number of nonrespondents, then you get distributions that with lots of 49s and 48s ,and fewer 41s. My sample histograms even reproduce the spike at 0, and the peak at 7 or 8. This is 10 lines of code in python:
from pylab import *
mnvar = 2 # deviation from 50/50 for each question
nonresp = 3 # mean nonrespondents on each side
ssize = 10000 # number of questions
a1 = floor(normal(50, mnvar, [ssize/2])) # first group answers
a2 = 100-a1 # the second group, their opponents
a1 -= poisson(nonresp, [ssize/2]) # nonrespondents in the first group
a2 -= poisson(nonresp, [ssize/2]) # nonrespondents in the second group
a = concatenate([a1,a2]) # put them all together
hist(mod(a,10))
Obviously, I didn't choose any numbers by hand. It seems at least reasonable that pollsters might focus on questions that are close to evenly divided in the population. So, while there's no excuse for not publishing your methods, there is at least one innocent, and quite plausable, explanation for this distribution.
the police are too far away. so we have a status quo here currently in the usa where hundreds of urban dwellers die every year from thugs with guns for the sake of a law which serves only the rural minority. but as the usa continues to urbanize further, and begins to equal european urban/rural ratios, political status quo will fall in line inevitably
and instead of HUNDREDS of urban dwellers dying every year for the sake of rural-friendly laws as we currently have, DOZENS of rural folks will die instead for the sake of urban friendly laws
inevitable. deal with it
"I am not FRINGE because I don't vote."
that's true. your SELF-DISENFRANCHIZED because you don't vote. your vote is your voice in your society. if you seek to not vote, you have willfully removed your own voice, you have chosen to be irrelvant. so why are you still fucking talking? you seek to not be a member of society. which is fine, drop out if you like: in which case, shut up and stop commenting on a society you freely choose not to belong to. if you want your opinion to be considered by us in this society, try to be a part of it by voting, and make your voice heard
but you don't get to drop out of society by your own choice and still think anything you say is relevant
if you want to be relevant, vote, and consider yourself to be a member of the same society as me. or don't, and, in logical coherence with that choice of yours, shut the fuck up
otherwise, there is absolutely zero for me to respect about anything you say, because by your own admission, you choose to not matter to me by not voting
oh you have your gun. awesome: why solve problems with voting when you can shoot, is that your point of view? fucking shizophrenic loser
intellectual property law is philosophically incoherent. it is your moral duty to ignore it or sabotage it
Problem is, there are so many undecided, that many election just hinge on the vote of those. Example igf you have 40% which are deciding long in advance for DEM, and 40% long in advance for Reps, then in the evry end those 20% do the decision. If it was 60% rep and 20% dem (or reversed) then the undecided would not matter. And indeed poll CAN influence *SOME* undecided, I have seen it at action in my family. It is anecdotial evidence, but it is enough to say that SOME undecided will be influenced by a poll, about as much as by a fake poll.
C. Sagan : A demon haunted world:
http://www.amazon.com/gp/product/0345409469/
visit randi.org
Bingo. I like your explanation.
I once had a signature.
The problem isn't precisely that he performed multiple tests. It's that he formed the hypothesis and then tested it using the same set of data. That amounts to having performed multiple tests, but isn't quite the same thing. That's a fairly subtle problem to correct, and I can't claim to be well versed enough in stats to know more than that it exists and can be dealt with to a degree through sufficient cleverness.
Tests like Fisher or KS are tricky to apply here — because they polled different sets of questions, we don't precisely expect that SV's results are drawn from the same distribution as the general set. What we expect is that they won't be "too different". That's a hard thing to test, and is the reason why Silver makes the argument from generalities rather than p-values.
I am Spartacus!
Web 2.0
Blank until
You guys have to forgive Rob ... he's just started learning Perl. Maybe he needs more coding experience! Give him some time, he'll get around to migrating our Chips & Dips stories eventually.
Don't forget guys, we celebrate our 222nd Independence Day in 2 weeks!
have you considered an alternative to FPTP voting? With the instant-runoff system (and plenty of other preferential systems), you still end up with 2 main parties, but you also get a few seats held by minor parties and independents, because people don't have to worry about tactical voting and so can safely vote for a minor candidate. At the same time, you still have a specific representative for each area, which is a big advantage over proportional representation.
It would be hard to get it implemented, because it would harm the existing 2 parties (not so much because they will instantly become unable to gain a majority, but becuase other parties will become more relevant), and so hey would block it.
I did a statistical analysis off the year 2000 "recount" almost 9 years ago, looking at the counties with "unusual" results.
There were six counties in which the changed votes didn't fit the normal bell curve, four benefiting Gore and two Bush.
Both of Bush's and one of Gore's had rules in which replacement ballots were made for idiot voters who used an X rather than filling the bubble, explaining them.
One of Gore's had machine problems in the recount and stuck with the original figures.
And then there were the two counties, which accounted for the lion's share of the "correction" from the recount.
One of them was 50 standard deviations out--so far out that it is less likely than winning the California Lottery every week for thirteen weeks running . . .
I wasn't the only one to notice the oddity, but the sad fact is that noone cares . . .
hawk
I love how the ".biz" TLD is effectively the "evil bit".
Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
>>It would be hard to get it implemented, because it would harm the existing 2 parties (not so much because they will instantly become unable to gain a majority, but becuase other parties will become more relevant), and so hey would block it.
Yeah. The chances of it getting implemented nationwide are somewhere between slim and none.
Also, instant runoff systems and the like would still not give us a Libertarian president, Libertarian fantasies aside (everyone would vote Libertarian if it didn't mean they were throwing their vote away!) or even a congress-critter. Maybe in some state legislatures you'd get some 3rd party representatives.
Also, no election with more than 2 candidates is immune to strategic voting. I proved that once, I think.
You really think that the only people who want guns legal are rural? And that the laws are "rural friendly" in that regards? I've got news for you, the vast majority of gun owners and enthusiasts are urban dwellers, and that isn't looking like it's going to change anymore now than it has in the past couple of decades.
In addition, you really think that the majority of murders with weapons wouldn't happen without weapons? People murdered each other before guns were invented, removing them might make a few cases go away but won't impact the vast majority of homicides.
Good luck voting to stop that bear from getting you by the way, I'm sure he'll listen to your excellently thought out democratic system of determining who he should eat next.
-Someone who owns no guns but isn't dumb enough to think guns are the root of problems humanity has had to deal with for centuries before the discovery of gunpowder.
There are two kinds of fool One says 'This is old therefore good' Another says 'This is new therefore better'- Dean Ing
Thank you. I think you meant to reply to the reply to me though. +1 insightful
>>Also note: If you understand statistics you would _never_ use the phrase 'statistically impossible'
He just means it is very very unlikely, of course (but you knew that).
I think what he was trying to say, was that any result you end up with can be argued to be very implausible. For example, any 52-choose-5 hand of cards you ended up with... well, it was very very unlikely you would have gotten those exact cards (about 1 out of 2.5 million), so a wag could claim the deck was stacked since there was only a one-out-of-2,500,000 chance that you would have drawn those cards. (The key point being, of course, that any hand of cards is equally likely in respect to each other.)
Not saying if the author in TFA was right or not. It's Slashdot - I haven't read it, of course.
. . . would a factual post be modded troll. I guess factually correct != politically correct.
Oh shut the fuck up. The guy's a statistician and has proven quite a good one at that. You barely grasp the law of large numbers. Fucking know-it-alls thinking they get all about anything and dismiss claims of experts without even a grasp of the basics of the topic at hand, get the fuck off my Internet.
You just got troll'd!
A quick google will point to the wrong company.... Unfortunately, TFA sometimes just specifies "Strategic Vision."
Serious question: could all of this just be accounted for by a really bad coding error or data preparation algorithm somewhere when it comes time to round numbers off, such that the trailing digits are non-random? I'm thinking about something like not carrying enough significant figures through the calculation or doing a "floor()" or "ceil()" instead of a proper rounding operation. There's a lot of potential for mangling the results between taking a (supposedly) representative poll across a whole country or region and then trying to scale it up to the full value.
In other words, even if the data is weird, are there innocent explanations? Is it an example of the all-too-common "don't attribute to malice what can be attributed to stupidity" scenario?
I know this might be slightly off-topic, but I think that the issues Slashdot has been having are due to an unexpected spike in traffic after they posted the story of how 3D Realms was switching over to Epic's Unreal Engine for the upcoming Duke Nukem Forever. I'm pretty stoked about this and am saving up to be able to afford a Voodoo2 - DNF is gonna be da bomb!!
I can sell you a Voodoo2 pretty cheap, so you can save your monies for one of those new Pentium 4's.
"The agriculture ministry is not in charge of Gundam" - Japanese ministry official.
Science!
You don't need fraud to lie using statistics!
Also note: If you understand statistics you would _never_ use the phrase 'statistically impossible'
If you understood thermodynamics, you'd know that 'statistically impossible' is why the world doesn't go crazy. Like sudden appearance of vacuum when you try to breathe or random melting of spoon when stirring your coffee.
Yes, I've studied thermodynamics and statistics. The problem is with the term "statistically impossible." There is a finite probability they such an event could occur; however it is so small that one would never expect to see it occur.
The real issue is many people do not understand statistics, and as a result act in irrational ways. They truly believe a 1-2-3-4-5-6 draw in less likely than some more random set of digits; yet the probability is the same for both events. Of course, those that do understand statistics can use that lack of knowledge to their advantage - casinos are a good example of this. Or, if you play lotto, you can pick combinations that don't increase your odds of winning but do increase the odds of being a sole winner. Or, on a coin flip, see what odds you can get on a 4th head coming up after 3 previous ones.
I'm a consultant - I convert gibberish into cash-flow.
I don't see where he forms his hypothesis based on the data. His description isn't crystal clear, but it sounds like he hypothesizes the digits should be uniformly distributed first, looks at the ensemble data for confirmation, then looks at SV's data.
Rereading the blog post, I think he DID make a mistake - he just took a look at the ensemble data and decided that it was close enough to evenly distributed, without testing it. He actually says: "But it is close to random, and could fairly easily have occurred through chance alone." This is not true - the probability of the ensemble data being drawn from a uniform distribution is 0.006 - it is very unlikely the observed distribution arose by chance alone.
As for the ensemble data and SV's data being from different questions, remember, we're not comparing poll results, we're comparing the least significant digit. In a large enough sample there's no reason why different questions should change the distribution of the least significant digit. There is a possibility that the least significant digit could be distributed differently in different types of polls - close presidential races versus landslide primaries, for example, but from the description and size of the two data sets I think it's likely they have sufficient breadth to even out such biases. If you were really worried about it you could take a subset of the ensemble data that very closely matches SVs and look at it's distribution.
First, the example he gives where he looks at polls from ALL sources is an example of a plausible distribution of real results because, assuming the majority of pollsters are not cooking their data, the data should be dominated by randomness.
Here is the thing. Did he begin with the theory that Strategic Vision was fraudulent, or did he begin with the theory that some pollsters were fraudulent?
After all, he was churning a lot of pollsters data.
Isnt it quite possible that he was simply mining his massive dataset for something, anything, that made any pollster look bad?
In short, how likely is it for one legitimate pollster out of many legitimate pollsters to have data that isn't quite normal (pun intended?)
"His name was James Damore."
A conservative poll organization that not accurate? Gasp! Why, soon you'll be telling me that Rasmussen polls aren't accurate either!
Goodness, what will O'Reilly, Beck and Limbaugh do without some sort of circular reference clusterfuck to draw on?
Please do not read this sig. Thank you.
From his posting, he talked to SV about their refusal to reveal their methodology, then decided to test to see whether their results showed any suspicious bias. He was specifically testing SV and not searching for any pollster.
You're right, if he tested multiple pollsters then he'd have to correct for multiple comparisons. Even so, you'd expect results as bad as SV's about one time in half a million. There aren't that many major pollsters, so you could detect results skewed as badly as SV's to a high confidence level using a data mining technique.
Oh, and by the way, the last president we ever had who was actually a president and not a shill for the wealthy was Jimmy Carter.
FYI, you're going to be old so quickly, it'll make your head spin.
Now get off my lawn!
Please do not read this sig. Thank you.
I'm sick and fucking tired of web sites that are a slim stip of content down the middle, with horseshit on the side.
You're a dumbass. RTFA and you'll see how likely these "patterns" are given the huge sample size (tens of thousands of numbers).
I mean seriously, what fucking arrogant cunt would think that a well regarded expert would fall for the most basic of mistakes of his domain of expertise, and that you'd be the one to point it out? Goddamn basement intellectuals...
You just got troll'd!
HaHa! What an excellent way to get people to listen to you.
Exactly how many times have you had to shoot someone? My guess is zero. So either you're completely paranoid or trying to rationalize your inferiority complex.
Huh? What ACORN thing? You sound like another conspiracy theorist. Just in case you haven't been paying attention (and you haven't been apparently) there was never any voter fraud associated with ACORN. It was never even in question.
Your rambling incoherence just proves you are a conspiracy theorist. Why should anyone listen to you?
Exactly how did people win because of guns? You haven't made a single point, instead you insist on rambling on about your wild eyed conspiracy theories.
Time makes more converts than reason
From his posting, he talked to SV about their refusal to reveal their methodology, then decided to test to see whether their results showed any suspicious bias. He was specifically testing SV and not searching for any pollster.
I suspect that refusal to reveal methodology is quite common, given that most are agenda-driven. Did he only speak to SV, or did he speak to lots of pollsters who refuse to reveal methodology?
Even so, you'd expect results as bad as SV's about one time in half a million. There aren't that many major pollsters, so you could detect results skewed as badly as SV's to a high confidence level using a data mining technique.
But there are LOTS of ways (infinite, really) to "test" data, so even if there are only 50 pollsters, you can still end up with millions of chances of finding arbitrary million-to-one outliers (where a lack of outliers would actually be suspicious!)
Is this second-digit test a common test for normal distribution, or is it an unusual method?
"His name was James Damore."
Every time a conservative does something fraudulent or immoral--which is constantly--all you have to do is scream, "OMG Teh Democrats Did It Too!" and that fixes it! Even if it's not really true. It deflects attention from your cherished Party and makes you feel a little better about yourself. Bravo!
Actually, I think most of the large polling organizations are pretty good about releasing their methodology. From the sound of it, this one is kind of an exception, and has taken a lot of flak for it.
You're certainly correct, if you go around comparing things long enough you're likely to get a false positive, unless you correct for multiple comparisons.
I've never actually seen any second-digit analysis before, but election and poll fraud isn't my field. I expect a lot of election monitoring would use similar techniques. A fair bit of work has been done looking at distributions of digits, including Benford's law and I believe work that shows that the fourth and on digits are actually uniformly distributed. There is also some psychological research looking at patterns in numbers that people tend to select. I seem to recall that if you ask a large group of people to pick a number between one and a hundred, a disproportionate number will pick either 32 or 36. It's a trick used by psychics - ask a large audience to pick a number between one and one hundred. Then ask whoever picked 32 to put up their hands. An impressive number of hands go up. Next pretend to be a little uncertain and say, wait, I'm also getting a strong signal on 36... and suddenly a bunch more hands go up. In a big audience it suddenly looks like most of the hands are up and now you can take their money.
Sure, in an unattainably perfect world with perfect election systems, this would be true. However, one most note that its impossible to have a single-winner voting system where more than two candidates stand for election where strategic voting is not rewarded if voting actually matters at all.
In the real world, strategic voting which takes into account the preferences and likely behavior of other voters, assuming it is based on accurate information, produces better results than blindly voting your own true preferences.
Even ignoring the incentives for strategic voting, though, there is a cost benefit analysis in pre-voting activities which effect the success of candidates and ballot propositions -- even if a person believes something is a good idea and plans to vote for it, they are far less likely to expend resources (whether by donations of money or of time and effort) if they feel that those resources are unlikely to make a difference in the outcome.
So, ultimately, there are good reasons why people's understanding of the popularity of a political idea or candidate affects their behavior regarding that idea or candidate.
Don't bother. The existence of spoons has already been disproved by Wachowski, Reeves et al. in 1999.
USE HOT GRITS WITH STATUE OF NATALIE PORTMAN (NAKED AND PETRIFIED)
Take any data set and you'll find patterns that are statistically impossible.
Not if you understand statistics.
Also note: If you understand statistics you would _never_ use the phrase 'statistically impossible'
I think his point is that you can apply the Texas Sharpshooter fallacy to make any set of numbers _appear_ to someone who does not understand statistics to be improbable. (although I may be embellishing his words a little)
Liar. Just look at our incarceration statistics!
I don't really have an opinion on gun control but I think this is wrong:
Premeditated murders maybe, but crime in general is greatly assisted by the availability of guns. The problem is that they're just so powerful. If you go into a bank with a knife and start waving it around and telling people to get on the ground they're just going to run away. But pull out a gun and everyone 10 meters around is going to obey every word because you can kill them instantly.
And people are defenseless against a gun but they can at least run or throw a chair or punch an attacker with a knife. And gun killings are easy and impersonal while with a knife the attacker has to struggle and get covered in blood and listen to screams or whatever.. much nastier
Swords are a problem I guess but they're impossible to carry concealed
wat
Which would be mighty hard unless they all asked the same questions of the same people. By looking at methodology (what states were polled, # of likely voters etc) there should be some commonality.
Instead, he used the results of all of their polls over a 5 year period (both controversial and non-controversial subjects). That should have created a suitably random result. It did not.
And ye shall know the truth, and the truth shall make you free.
John 8:32(King James Version)
Sorry, I don't really follow what you're getting at (despite the gratuitous bold face). If you're suggesting that the industry-wide data shows widespread fraud because the digits are not equally probable, you're likely wrong. There are lots of ways a slight bias could creep into the second significant digit. It's also unlikely the whole industry is making up their numbers.
If you're suggesting that the SV data is highly irregular, you're preaching to the choir. However, the SV data is not suspicious because it does not show a uniform distribution, it is suspicious because it is significantly different from the observed average distribution.
Yes, because the fact that I use insults and profanity validates your original claims! "Hur dur there's shapes in clouds therefore it's perfectly normal that out of tens of thousands of random numbers there should be numbers that occur twice more often than others. Teh expret is teh stupid!!11"
You just got troll'd!
I vote about half the time... because I don't vote strategically, I vote my desires - which means I end up "throwing my vote away", which in turn discourages me enough that I don't vote the next time.
Premeditated murders maybe, but crime in general is greatly assisted by the availability of guns.
True, but the real point is that laws banning guns won't stop crime, since so many of the guns used in crime are already illegal and/or illegally obtained. It will reduce crime, though only slightly.
Unfortunately, gun control laws never have the desired effect, for the simple reason that if you're going to commit bank robbery, you don't really care about gun laws. The only place I've ever heard of having gun laws that work is Japan, for the simple reason that they have had an absolute ban for 60 years now, and therefore nobody has them, versus in the US where if they're banned, law abiding citizens will turn them in, while criminals won't give a crap.
Gun control laws in the US could work if you could simultaneously destroy every gun in the US and start fresh, but other than that, they're a band-aid on a missing limb.
That's true. Criminals will get guns anyway.
But drastically reducing the number of guns in the country would drive prices way up on the black market. If every gun had to be stolen from a cop or smuggled across the border then they'd be way expensive.
You've posted this idiotic response both here and on TFA site. Nate Silver is stating the the values are not a product of a uniform distribution on the grounds that the outliers are many SDs away from the mean. If by meaningless you are asserting the null hypothesis, this is precisely what has been determined to be extremely unlikely.
Please shut up.
The distribution could be explained if the editor of SV was a numerologist. Suppose that SV had an opportunity to run one of two valid poll results -- one that puts McCain ahead 51-46 among white plumbers and one that puts him ahead 49-47 among white electricians. The numerologist editor decides that they should go with the electrician poll on the basis that numbers ending with 7 and 9 are luckier than those ending with 6 and 1. The plumber poll never gets published. The numerologist editor theory explains the distribution without invalidating the polling methodology.
I accept your apology!
BTW. The Max "8" at 676 is not "twice more often than" the Min "1" at 431. I'm sure it was just a rounding error on your part, in the middle of a discussion of statistics. ;)
Who gives a shit, I didn't even bother to verify in the article. Cause no one gives a shit. Except a sucker like you who'll try to hang on to anything that would help make him sound less like the dumb one of the two of us. If you're not completely retarded you can see that there's no way you can have 676 8's and 431 1's with random numbers of those ranges and distributions.
You just got troll'd!
The issue isn't the hypothesis he forms about the second digits. The issue is that he forms a hypothesis that SV is producing strange results, based on their poll data produced to date, and then tests that hypothesis using the same data. He could have performed multiple tests and eventually decided the second-digits test looked interesting, but we'll assume Silver is a more honest statistician than that, and that the second digit test is the first test he did. No correction is required there.
Where the correction is needed is a step earlier -- it would have been equally reasonable to form a hypothesis that any other pollster was behaving badly; he chose to pick on SV only after looking at their poll results. This is perfectly reasonable and normal, however it requires a multiple-tests correction. If he has 100 pollsters in his database, we expect one of them to have a second-digits distribution that is weird at the p=0.01 level. So he needs to correct his p-value upwards by at least the number of polling agencies with a significant number of polls in his database. However, the SV p-values are so tiny that they're still very highly significant even after any plausible correction.
in fact, i believe that if guns were banned death by knives would go way up. but not as high as mortality rates due to guns: its simple matter of quantity of lethal force available to you. a knife is many factors less in lethality than a gun
i also believe that if someone still wants a gun, and they are committed enough, they will still get guns even though they are illegal. but that's not the point either. the whole issue are the thuggish morons who don't have the easy wherewithall to get a gun, who would do their typical retarded mayhem with far less lethality than they do today
making guns harder to get takes them out of the hands of your casual loser. that's the whole point. that's the beginning and the ending of the whole reason to constrain guns. and that matters, that makes a massive difference in mortality rates. which is the whole fucking point!: less pointless senseless death at the hands of complete losers and morons simply because they have less lethality in easy reach
now accuse me of some secret fascist agenda instead. zzz, tired and typical
intellectual property law is philosophically incoherent. it is your moral duty to ignore it or sabotage it
when every conflict was decided by lethal force. there are still places today like that. like somalia: warlords and anarchy
however, other parts of this world developed this wacky thing called civilization. where there are rules, people decided their differences in a court of law, and they voted for their leader, rather than their leader being determined to be the guy with the most guns. its called civil society. where if i have a difference with you, i TALK to you, rather than SHOOT you. isn't that an amazing concept?
"Yes, YOU are part of the problem with your holy "vote" which you are stupid enough to think changes anything. YOUR hand is on the rope--and YOU are one of the reasons GUNS are needed--lots and lots of GUNS. How do you like my making myself heard, NOW, victim disarmer?:)"
my vote does actually changes things. and my vote has changed things. and with any hope and luck, my vote will continue to move us further away from the era where mortal conflict decided things, which is apparently the only reality you understand. you and people like you are the soil in which tyranny grows. where force of violence is more important than force of reason, where strength is more important than intelligence. you and people who think like you represent our downfall and everything the founding fathers of this country stood for. you and how you think represent the loyal ranks of every force of goons every tyrant has ever needed to keep his people in fear and under his boot: i have the gun, so do as a i say. the gun decides the day, not what is actually right and wrong. there are many warlords and their henchman in somalia who agree with you completely. maybe you should take your masturbatory soldier of fortune fantasies to their logical conclusion, and ship off as a mercenary to some hellhole, where in 6-12 months you will be a maggot laden corpse, which is the inevitable conclusion of your way of thinking
sir, i've talked to many pathetic losers on this here glorious intarwebs. and you sir, are a glorious loser of the highest order: a gun is more important than a vote
fucking pathetic beyond belief. the antithesis of everything the founding fathers stood for
intellectual property law is philosophically incoherent. it is your moral duty to ignore it or sabotage it
with a two party system you point out i do not dispute, and agree with wholeheartedly
but the error in your thinking is that 3,4,5,6, etc. party systems are somehow superior to a two party system. multiparty systems suffer from different negatives than two party systems, but add up all the negatives, and they are basically all the same amount of suckage
in other words, looking at two parties as the source of our problems is the mistake. no, its more ephemerals reason: corruption, collusion, hypocrisy, selling out your constituents, etc., which has absolutely nothing to do with how many parties there are at all
when i ask people to stop criticizing the two party system, it is not because i defend the system, but because i know the source of the problem is much deeper. if the usa had a stable 3,4,5,etc party system in place, we would have just as many problems, of the same kind
all i am saying is that complaining about the two party system is simply a red herring, a dead end. if your desire is to effect positive change, fight the real, deeper reasons instead. fighting the two party system is simply wasting your time and energy: fruitless even if you did create more stable parties
intellectual property law is philosophically incoherent. it is your moral duty to ignore it or sabotage it
Without knowing more about the nature of the polls, it is just plain silly to make the claim that because the distribution is not random, there must be fraud. For a wonderful example, look at the distribution of move ratings by viewers. Assuming that you're using a 1-5 scale, you'll see is that there are far more 1's and 5's than anything else and very few 2's. If a person likes something they are more likely to give a 5 than a 4 and if they didn't they're more likely to give a 1 than a 2. If you want to see for yourself, go download the dataset used for the Netflix prize and check the distribution. Or, pick a random movie from IMDB and look at the distribution. You'll see that the number of votes for a rating of 1 is far more than those for 2 or 3. The proper way to do this analysis is to compare the distribution against the expected distribution, not to compare it against a uniform distribution. If I saw a uniform distribution of movie ratings, I'd cry foul.
Again, I don't see what makes you sure that he chose to pick on SV AFTER looking at their poll results.
From his blog:
It certainly sounds like he chose to single out SV before he looked at any data and, since we're both willing to assume that he didn't go hunting for a particular test that would make them look bad, that implies he made only one comparison.
If he was datamining then yes, he needs to make corrections.
I had actually been thinking of it not so much as Silver mining for a pollster to pick on and coming up with SV, as the community at large (aka AAPOR) deciding to pick on SV and Silver then running tests. The community at large obviously had all the old poll data, and the correction is required whether Silver did the mining himself or not.
However, on re-reading the section you point out, I'm inclined to think the original accusations by AAPOR, and the subsequent decision to investigate SV by Silver, were both unrelated to the detailed poll results, and based only on the lack of published methodology, in which case you're right that no correction is needed.
OTOH, as Silver puts it (in a later post):
Mr. Johnson may be right that the implication that his data may have been forged could be difficult to categorically disprove. Had the statistical evidence been only marginally compelling, I would not have made it. With that said, I would also tend to treat -- and would encourage those in the media to treat -- "alternate hypotheses" raised by Strategic Vision with some greater-than-usual amount of sympathy. So far, Johnson has not offered any. (Emphasis added.)
I'm inclined to agree with that view. It's not the same as making p-value corrections, but the idea is similar. It also captures the basic idea the underlying data are not trivial to analyze (SV isn't quite polling the same regions and the same questions as everyone else), which makes the p-values that result more approximate than is normal for statistical analysis.
And, of course, the numbers are so extreme that no plausible amount of p-value correction would make them look good for SV. It's possible there are hypotheses other than fraud that explain those numbers, though, and we need to be careful to realize that p-values don't distinguish between fraud and something else weird but entirely legitimate -- all they say is "this isn't normal." Of course, the threats of lawsuits and lack of explanations point rather strongly to fraud at this point.
I absolutely agree - Silver started out with a hypothesis that the second digits should be uniformly distributed, which turns out to likely not be true. I (and hopefully he) used the industry average as an expected distribution instead, which seems reasonable, but I'd listen very sympathetically to any suggestions about why SV's data might not be the same as the industry average.
You take much for granted. My politeness was not an apology. But, I can expect no less considering the quality of the discussion so far.
lol, dumbass, learn to not take everything on the first degree.
There were less that 6 thousand data points for this analysis. So, your seeming belief this was based on a "huge sample size (tens of thousands of numbers)" is also at risk.
Holy shit retarded batman. It could as well be only 100 points, the different between eights and ones would still be very significant. God damn!
You just got troll'd!
The key datapoint that Silver brings up is that way too many numbers end in 7. He doesn't mention but this makes it much worse than if it were any other number that appeared too often. When people are asked to pick a random number they are much more likely to pick an odd number and are much much more likely to pick a number ending in 7. In general people are also likely to pick numbers that are close to (3/4)n when asked to pick a random number between 1 and n. Thus, when asked to pick a random number between 1 and 4, about 40% pick 3. For 1 to 10, you get a similar jump of people picking 7, and when asked to pick a random number between 1 and 50 something like 10% or 20% of people pick 37. This looks like textbook data of a human trying to make random data and sucking at it.
What's to be said about my "grasp of concepts like sample size"? Dumbass. You're just the typical nerd who's gotta act like he knows about it all to sound smart, knowledgeable and relevant when your expertise on what you're talking about is very thin and shallow.
You just got troll'd!
I'm also feeling compelled to point out that you barely even tried to fight my claims. All the fuck you could say was "there's patterns in random things", which is a dumbass thing to say when someone is accusing you of not understanding how the sample sized that's being talked about here affects the certainty that we're looking at a non-random pattern. Instead you chose to go "oh you're a bad person because you use profanity". Damn right I use profanity, fagget. But you can't even explain why you think I'm wrong, no matter how hard you try to deflect or use dumb analogies. Cause you don't know shit about the underlying math. You probably wouldn't know a normal distribution if you saw one.
You just got troll'd!
Yeah, but /. brings all that together into the one package. I don't have to search around on 20 different websites to find all that.
people congregate, they are social animals. you can't outlaw political parties. in other words, yes, all of the negatives of political parties are 100% real. and yet its like pointing out that its bad you will die someday: there's nothing you can do to change it, its just a fact of reality, a negative you have to deal with, as there is no getting rid of that negative
intellectual property law is philosophically incoherent. it is your moral duty to ignore it or sabotage it
Not by much. Simple guns, more than adequate for murder or robbery, can be turned out from scrap by any high school kid in a metal shop. Juvenile delinquents in the 1950s made zip guns out of car antennas. Guns like the Sten submachine gun can be made cheaply in clandestine machine shops -- thousands of them were made by the resistance in occupied Poland during WWII. (Plans for building a Sten here.)
Gun control keeps guns away from bad guys about as well as drug laws keep junkies away from heroin.
Tom Swiss | the infamous tms | my blog
You cannot wash away blood with blood
If you didn't believe Nate Silver at least recognize that Strategic Vision is at best very sketchy. Of their "7 locations" all 7 are actually UPS stores. They also have spent around $5,000,000 conducting polls but refuse to state their source of funding. They refuse to say who they poll or how they conduct their polls. Even if they didn't out-right make up data (which they likely did) they are not to be trusted. This is less about politics and more about valid scientific polling methods and transparency. source: http://www.politico.com/blogs/bensmith/0909/Embattled_pollster_defends_methods.html
Antibiotic Biaxin On-Line http://www.forum.lasik-eye-surgeon.org/index.php?topic=34475.msg34720#msg34720