[A preliminary version of this post appeared December 07, 2004 at 9:25 PM. A second version appeared December 09, 2004 at 12:22 AM. This is the final version, incorporating this comment by Mark Blumenthal (who runs the blog Mystery Pollster, having earned that moniker from Slate.com’s Mickey Kaus): namely, exit polls, due to their cluster design, possess larger standard errors than the 1/sqrt(N) associated with truly random sampling. The conclusion in all versions was the same, and indeed the case has become stronger with each revision. Namely, with a plausible estimate of the exit poll bias, the entire variation of the data can be ascribed to random sampling error.]

The past week has been marked by a court challenge to the just-certified vote tally in Ohio as well as informal hearings by the Democratic members of the House Judiciary Committee devoted to investigating voting irregularities in Ohio. (Archived audio of this session is currently available here from Pacifica Radio.) As such, I believe it is an appropriate time to revisit the claims of various statistically significant irregularities in pre-election and exit polling that have been promulgated over the last month.

In this post, I shall revisit the exit poll irregularity, namely the overwhelming tendency for President Bush to poll better in the official tally than he did in the exit poll:

First of all, I thank Dr. Steven Freeman, Resident Scholar at the University of Pennsylvania and witness at the informal hearings of the House Judiciary Democrats held on 8 December, for sharing his compilation of the depicted exit poll data as well imparting as some helpful comments.  Any errors in the plot or the disscussion below should be considered as my own, and not Dr. Freeman’s.)


[Click image for popup window with larger size (800 x 546) version, or click here to download extra large (1138 x 778) version.]

Figure 1. Comparison of the Bush-Kerry victory margins in the official vote tallies versus those inferred from the vote-by-gender data in the Edison Media Services and Mitofsky International exit poll as this poll stood in the third of the four versions* (i.e., circa 12:20 AM on Wednesday, 3 November 2004) released on the CNN Election 2004 exit poll web page which were archived primarily by Jonathan Simon of the Alliance for Democracy and secondarily by the above-mentioned Dr. Steven Freeman.

*The reason this third and penultimate version of the exit poll was used instead of the fourth and final version is that the final version was renormalized to agree with the final election results (which is a perfectly legitimate polling technique as the purpose of exit polls here in the developed democratic world is to elucidate the demographics and motivations of voters rather than to verify that the authorities have fairly tallied the vote).    Neither Edison Media Services, Mitofsky International, nor any of the major news organizations (ABC, CBS, CNN, Fox, NBC, and the Associated Press) that were their clients have yet officially released final-but-unrenormalized data.   It is expected this data will finally be released in early 2005. The reason why the vote was inferred from the vote-by-gender data is that polls, as released on the aforementioned CNN website, did not list a bottom line Bush vs. Kerry vote. Instead, they only listed vote by various demographic categories.

Is this proof of systematic voting fraud?  Or is this merely proof of systematic exit polling bias?

To answer this, a bit of background is necessary. Someday, I might write this background, but until then, an excellent place to gain the necessary background is the fabulously informative blog of pollster Mark Blumenthal, aka The Mystery Pollster. Mr. Blumenthal frequently posted on the issue of the exit poll anomalies, and he has conveniently indexed all these posts on this page.

But for the moment, please let me be lazy and let’s just cut to the chase.

I think the basic story is this: if we presume that there’s systematic bias and we make the plausible estimate that this bias equals the mean or median discrepancy (which essentially coincide), then the various data values can be attributed entirely to sampling error.

That is, assume for the sake of argument that there is systematic bias. The most often proposed mechanism for such a systematic bias is that Republican-leaning voters agree to fill out exit polls less often than Democrat-leaning voters do (perhaps due to the lower esteem in which Republican-leaning voters seem to hold the media). A plausible estimate of this bias would then be the mean or median discrepancy (i.e., 3.7 or 3.9%, respectively), especially because such a value hardly requires all that much of a difference in the likelihood of Republican-leaning vs. Democrat-leaning voters to fill out exit polls. Moreover, one might ascribe additional plausibility to this bias estimate because the mean and median discrepancy essentially coincide. That is, one expects from standard central limit theorem arguments that simple sampling error should be Gaussian-like unimodal and symmetric, and if one subtracts off a number like 3.7 to 3.9%, one finds essentially unimodal, symmetric scatter about the origin. (Of course, one might counterargue that ascribing additional plausibility based on this fact exposes your prejudice toward writing off the data as simply sampling error around some systematic bias.)

Next, note that the data is very noisy. The 95% confidence interval for all the data points is estimated to +/- 6-8%, with the lower value for the swing states which received larger exit polls than the non-swing states. (NB: The depicted data are the official vs. exit poll discrepancies in the Bush-Kerry victory margins, and not discrepancies in either the individual Bush or Kerry percentages. As such, the correct margin of error for the above plot is double the usually quoted margin of error for the exit poll, which would apply to the individual Bush or Kerry percentages. Moreover, it must be realized that since exit polls are not truly random samples across the whole state, but rather highly clustered samples, they carry a larger standard error than the usual 1/sqrt(N) where N is the sample size. Edison Media Services/Mitofsky International have posted their state-by-state exit poll methodology on their website (note: this link is to a Adobe PDF file). Their methodology estimates that for individual results, the relevant margin of error (i.e., the one applicable to questions decided by even 50/50 margins, as opposed to >75/25 margins) drops from about +/- 4% for their exit polls with N = 950 to about +/- 3% for exit polls with N>2350. Again, for the data depicted above—i.e., victory margins and not individual numbers—the correct margins are twice these quoted values. For a fuller explanation, see this post by Mark Blumenthal of the blog Mystery Pollster.)

Thus, if one subtracts off the plausible bias estimate of 3.7-3.9% from all the data points, then all the values will lie within the 95% confidence interval around zero. Hence, with this bias estimate and with the usual convention for statistical significance, the exit poll data do not represent a statistically significant deviation from the hypothesis that the exit poll data in fact agrees with the official tallies (i.e., the hypothesis that if the exit poll samples were made sufficiently large, they would have indeed converged to the official tallies.)

Of course, if you do believe that the exit poll is unbiased, then the above subtraction is unwarranted and one would conclude that all the data points in the reddest parts of the plot (i.e., above the 6%, 7%, or 8% line, depending on the state) mark statistically significant discrepancies between the exit poll and the official tallies. (Again, “statistically significant discrepancy” is used here with the conventional “lies outside the 95% confidence interval around zero discrepancy” definition.) Interestingly in this scenario, all the states with statistically significant discrepancies are not swing states. Rather, they’re states from New England, the Tri-State Area, the Great Plains, or the Deep South. (Admittedly, though, the crucial swing states of Ohio and Florida are just within their respective 95% confidence intervals of +/- 6% and +/- 7% around zero… drawing the line of demarcation between significance and insignficance at 95% is just an arbitrary convention, after all.) Therefore, if one assumes the exit poll was unbiased and furthermore attributes the statistically significant discrepancies to fraud, then the data suggests the following scenario which to the best of my knowledge has not been really considered previously. Namely, nefarious Republicans padded President Bush’s popular vote by throwing out or changing Democratic votes in safely red and safely blue states rather than just stealing vital electoral votes with vile fraud focused on Florida and Ohio.

Where do I come down personally? Well, personally I think the exit polls are probably biased, the mean discrepancy is a plausible estimate of the bias, and thus the entire variation of the data is plausibly attributable to sampling error. Indeed, since the exit poll data is so noisy, if one in fact believes that the official tally was significantly wrong in many states, then one should expect to see at least a few data points with even larger discrepancies than those seen above. One must always remember that sampling error goes both ways. It can spuriously augment observations just as well as it can spuriously diminish them. (Of course, if one believes that the there were only shenanigans in a few states and that these were done with some subtlety, then the data is simply too noisy to locate them.)

Nevertheless, I’d still be very curious to hear pollsters explicitly account for why the bias was so large, because nearly 4% bias is pretty darn big for a professional polling organization. If I were a paying customer of Edison Media Services and Mitofsky International like ABC, CBS, CNN, Fox, et cetera, I’d be plenty PO’ed.


For all you regular readers wondering about the sharp falloff in postings of late, it was sadly necessitated by my need to do my actual PhD work in quantum computation, namely my thesis proposal and oral exams.   Happily, they’re all done now, and I thus may now pursue some much-needed procrastination.   

First on deck is this election thingy that happened about a month ago and its many alleged irregularities. 

Stay tuned.