## Systematic Voting Fraud or Systematic Exit Polling Bias?

### December 11, 2004

[*A preliminary version of this post appeared December 07, 2004 at 9:25 PM. A second version appeared December 09, 2004 at 12:22 AM. This is the final version, incorporating this comment by Mark Blumenthal (who runs the blog Mystery Pollster, having earned that moniker from Slate.com’s Mickey Kaus): namely, exit polls, due to their cluster design, possess larger standard errors than the 1/sqrt(N) associated with truly random sampling. The conclusion in all versions was the same, and indeed the case has become stronger with each revision. Namely, with a plausible estimate of the exit poll bias, the entire variation of the data can be ascribed to random sampling error.*]

The past week has been marked by a court challenge to the just-certified vote tally in Ohio as well as informal hearings by the Democratic members of the House Judiciary Committee devoted to investigating voting irregularities in Ohio. (Archived audio of this session is currently available here from Pacifica Radio.) As such, I believe it is an appropriate time to revisit the claims of various statistically significant irregularities in pre-election and exit polling that have been promulgated over the last month.

In this post, I shall revisit the exit poll irregularity, namely the overwhelming tendency for President Bush to poll better in the official tally than he did in the exit poll:

First of all, I thank Dr. Steven Freeman, Resident Scholar at the University of Pennsylvania and witness at the informal hearings of the House Judiciary Democrats held on 8 December, for sharing his compilation of the depicted exit poll data as well imparting as some helpful comments. Any errors in the plot or the disscussion below should be considered as my own, and not Dr. Freeman’s.)

[Click image for popup window with larger size (800 x 546) version, or click here to download extra large (1138 x 778) version.]

**Figure 1. **Comparison of the Bush-Kerry victory margins in the official vote tallies versus those inferred from the vote-by-gender data in the Edison Media Services and Mitofsky International exit poll as this poll stood in the **third of the four versions*** (i.e., circa 12:20 AM on Wednesday, 3 November 2004) released on the CNN Election 2004 exit poll web page which were archived primarily by Jonathan Simon of the Alliance for Democracy and secondarily by the above-mentioned Dr. Steven Freeman.

**The reason this third and penultimate version of the exit poll was used instead of the fourth and final version is that the final version was renormalized to agree with the final election results (which is a perfectly legitimate polling technique as the purpose of exit polls here in the developed democratic world is to elucidate the demographics and motivations of voters rather than to verify that the authorities have fairly tallied the vote). Neither Edison Media Services, Mitofsky International, nor any of the major news organizations (ABC, CBS, CNN, Fox, NBC, and the Associated Press) that were their clients have yet officially released final-but-unrenormalized data. It is expected this data will finally be released in early 2005. The reason why the vote was inferred from the vote-by-gender data is that polls, as released on the aforementioned CNN website, did not list a bottom line Bush vs. Kerry vote. Instead, they only listed vote by various demographic categories. *

Is this proof of systematic voting fraud? Or is this merely proof of systematic exit polling bias?

To answer this, a bit of background is necessary. Someday, I might write this background, but until then, an excellent place to gain the necessary background is the fabulously informative blog of pollster Mark Blumenthal, aka The Mystery Pollster. Mr. Blumenthal frequently posted on the issue of the exit poll anomalies, and he has conveniently indexed all these posts on this page.

But for the moment, please let me be lazy and let’s just cut to the chase.

**I think the basic story is this: if we presume that there’s systematic bias and we make the plausible estimate that this bias equals the mean or median discrepancy (which essentially coincide), then the various data values can be attributed entirely to sampling error. **

That is, assume for the sake of argument that there is systematic bias. The most often proposed mechanism for such a systematic bias is that Republican-leaning voters agree to fill out exit polls less often than Democrat-leaning voters do (perhaps due to the lower esteem in which Republican-leaning voters seem to hold the media). A plausible estimate of this bias would then be the mean or median discrepancy (i.e., 3.7 or 3.9%, respectively), especially because such a value hardly requires all that much of a difference in the likelihood of Republican-leaning vs. Democrat-leaning voters to fill out exit polls. Moreover, one might ascribe additional plausibility to this bias estimate because the mean and median discrepancy essentially coincide. That is, one expects from standard central limit theorem arguments that simple sampling error should be Gaussian-like unimodal and symmetric, and if one subtracts off a number like 3.7 to 3.9%, one finds essentially unimodal, symmetric scatter about the origin. (Of course, one might counterargue that ascribing additional plausibility based on this fact exposes your prejudice toward writing off the data as simply sampling error around some systematic bias.)

Next, note that the data is very noisy. The 95% confidence interval for all the data points is estimated to +/- 6-8%, with the lower value for the swing states which received larger exit polls than the non-swing states. (NB: The depicted data are the official vs. exit poll discrepancies in the Bush-Kerry victory margins, and not discrepancies in either the individual Bush or Kerry percentages. As such, the correct margin of error for the above plot is * double* the usually quoted margin of error for the exit poll, which would apply to the individual Bush or Kerry percentages. Moreover, it must be realized that since exit polls are not truly random samples across the whole state, but rather highly clustered samples, they carry a larger standard error than the usual 1/sqrt(N) where N is the sample size. Edison Media Services/Mitofsky International have posted their state-by-state exit poll methodology on their website (note: this link is to a Adobe PDF file). Their methodology estimates that for individual results, the relevant margin of error (i.e., the one applicable to questions decided by even 50/50 margins, as opposed to >75/25 margins) drops from about +/- 4% for their exit polls with N = 950 to about +/- 3% for exit polls with N>2350. Again, for the data depicted above—i.e., victory margins and not individual numbers—the correct margins are twice these quoted values. For a fuller explanation, see this post by Mark Blumenthal of the blog Mystery Pollster.)

Thus, if one subtracts off the plausible bias estimate of 3.7-3.9% from all the data points, then all the values will lie within the 95% confidence interval around zero. Hence, with this bias estimate and with the usual convention for statistical significance, the exit poll data do not represent a statistically significant deviation from the hypothesis that the exit poll data in fact agrees with the official tallies (i.e., the hypothesis that if the exit poll samples were made sufficiently large, they would have indeed converged to the official tallies.)

Of course, if you do believe that the exit poll is unbiased, then the above subtraction is unwarranted and one would conclude that all the data points in the reddest parts of the plot (i.e., above the 6%, 7%, or 8% line, depending on the state) mark statistically significant discrepancies between the exit poll and the official tallies. (Again, “statistically significant discrepancy” is used here with the conventional “lies outside the 95% confidence interval around zero discrepancy” definition.) Interestingly in this scenario, all the states with statistically significant discrepancies are not swing states. Rather, they’re states from New England, the Tri-State Area, the Great Plains, or the Deep South. (Admittedly, though, the crucial swing states of Ohio and Florida are just within their respective 95% confidence intervals of +/- 6% and +/- 7% around zero… drawing the line of demarcation between significance and insignficance at 95% is just an arbitrary convention, after all.) Therefore, if one assumes the exit poll was unbiased and furthermore attributes the statistically significant discrepancies to fraud, then the data suggests the following scenario which to the best of my knowledge has not been really considered previously. Namely, nefarious Republicans padded President Bush’s popular vote by throwing out or changing Democratic votes in safely red and safely blue states rather than just stealing vital electoral votes with vile fraud focused on Florida and Ohio.

Where do I come down personally? Well, personally I think the exit polls are probably biased, the mean discrepancy is a plausible estimate of the bias, and thus the entire variation of the data is plausibly attributable to sampling error. Indeed, since the exit poll data is so noisy, if one in fact believes that the official tally was significantly wrong in many states, then one should expect to see at least a few data points with even larger discrepancies than those seen above. One must always remember that sampling error goes both ways. It can spuriously augment observations just as well as it can spuriously diminish them. (Of course, if one believes that the there were only shenanigans in a few states and that these were done with some subtlety, then the data is simply too noisy to locate them.)

Nevertheless, I’d still be very curious to hear pollsters explicitly account for why the bias was so large, because nearly 4% bias is pretty darn big for a professional polling organization. If I were a paying customer of Edison Media Services and Mitofsky International like ABC, CBS, CNN, Fox, et cetera, I’d be plenty PO’ed.

December 14, 2004 at 10:20 pm

Exits: Were They Really “Wrong?”Last week’s posting of more detailed information on the sampling error of exit polls by the National Election Pool (NEP) allows for a quick review of the now well established conventional wisdom that the

December 17, 2004 at 12:49 pm

Hi. I added a comment on this to Blumenthal’s site, but thought I should write some more here. One reason is that, contrary to my post which I made while a bit sleepy last night, I’ve had some time to mull things over a bit more. When I first approached your post, I went to the figure and the legends and the links to the figure to try to assess it. I didn’t read the rest of your post yet, figuring it was better to match the conclusions to the raw data first. So your post did answer at least one of my questions. Other questions remain unanswered.

There was the small problem with the median having 24 below and 26 above (and appropriately one on the median). It would seem the median should be shifted up one state. But then going to the referenced papers the numbers that were given for the Simon’s red shift (as it was described) numbers don’t match the locations on your plot. They are usually double the numbers Simon has in the linked document, although not always. For example, Simon lists 2.5% for AZ, you have about 4.4% on the graph. For those states which Freeman does give numbers it seems as though they do match Freeman’s numbers (with the exception that Freeman has New Mexico at 3.7% and you have it at 3.9%).

I mentioned in my post about your having +/- 8% margin of error. You give the reason why in your write-up. But. . . the reason you give seems to be wholly wrong. You add together the margins of errors of each candidate. These are not independent determinations, in fact, because (% in favor of Kerry) is near equal to (100 – % in favor of Bush), these are almost entirely dependent calculations. In fact, the two samplings reinforce each other by increasing the sample size, and it would seem to me that would mean the margin of error would be less than the +/- 4%. A second reason it is slightly less is that the numbers are better for 12 key states. Let’s say that moves the overall margin of error to about plus/minus 3.7%. Then the median is outside the margin of error. If that is the case, we have some very skewed data.

Sincerely,

Martin Hill, PhD

Here is the post I made at the Blumenthal site. Some of it is repeated above, some has been answered by a better reading on my part:

Okay, I like to think I’m not a dummy. At least sometime in the recent past I wasn’t one, I’m getting older now. But I clicked on the link to William Kaminski’s blog and especially his chart. I like his chart, nice eye candy. I would have toned down the reds and blues to let the other data stand out a bit more.

But. . . there are the minor quibbles. He puts 26 states above the median and 24 below (and, of course, one on the median). I mention that first because I noticed that first. Fine, that would bump the median up nary a notch.

Then I decided to look at some of the data he referenced. He said he referenced Simon’s data first and then Freeman’s. He gave links. Checking these links, the points he plotted were always Freeman first. Then when it was Freeman’s data only (the majority of the points at least on the links that were reference) other points were far off what was presented in Freeman’s red shift data. For example, Simon describes a 4.2% red shift for Alabama. The map has double that. Arizona has 2.5% according to Freeman, but is plotted about 4.4% on the map.

Kaminski shows a margin of error of +/- 8 points compared to the +/- 4 Blumenthal refers to. His mean discrepancy is 3.7%, while Blumenthal is saying nationwide 1.7%.

Am I missing something here? Sorry if I am casting aspersions if it turns out that it is simply that I don’t understand. Perhaps the links he provides are not the most recent ones. He plots data for NJ, NY, NC, and VA, which are not included in either link.

thanks for allowing me to voice these perplexions.

md hill

December 17, 2004 at 9:15 pm

In response to Dr. Hill:

There appear to be a few innocent misunderstandings stemming from the fact that others (e.g., Mr. Simon and, on occassion, Dr. Freeman) have written about the discrepancies between the exit polls and the official count in a way different than I have.

1) First and foremost, Simon and, on occassion, Freeman chart the “red shift”, the

discrepancy in the individual Bush percentages between the official count and the exit poll, that is:(Official Bush %) – (Exit Poll Bush %)

What I plot is not this but rather the

discrepancy in the Bush or Kerry victory margins between the official count and the exit poll, that is:[(Offical Bush %) – (Official Kerry %)] – [(Exit Poll Bush %) – (Exit Poll Kerry %)]

which, not suprisingly, is usually about double the “red shift”.

To take your example of Arizona, Dr. Freeman (whose data as of December 1st is what I use) lists

Bush Official Count… 55.3%

Bush Exit Poll………. 52.8%

Kerry Official Count… 44.7%

Kerry Exit Poll………. 46.7%

So we see the “red shift” for Arizona is:

(55.3% – 52.8%) = 2.5%

which agrees with Simon’s list (and you can also see Simon’s list also agrees with the respective Arizona exit poll numbers of 52.8% Bush and 46.7% Kerry).

In contrast, what I plot, the discrepancy in the victory margin is

(55.3% – 44.7%) – (52.8% – 46.7%) = (10.6%) – (6.1%) = 4.5%

which is what you see on my plot.

2) The conventional way a 95% confidence interval or “margin of error” is defined is for a single number, e.g., the (Exit Poll Bush %) by itself or the (Exit Poll Kerry %) by itself. If one subtracts these 2 numbers to calculate the Bush – Kerry exit poll victory margin, then one is thus subtracting two numbers each possessing random error due to finite sample size. As you write, the key issue is then to think how these random errors are correlated, and as you write these errors should be essentially exactly anticorrelated since–ignoring the small number of votes going to 3rd party candidates or purposeful no votes–finding a sample which due to its finite size gives President Bush a spurious advantage +X% over his true count should give Senator Kerry a spurious disadvantage of -X% off his true count. Therefore, I believe it’s correct to multiply the 95% confidence interval estimates by the exit pollsters (Edison/Mitofsky) by a factor of 2.

[Note that in races with more than 2 serious candidates or in pre-election polls which have significant number of Don’t Know / No Answer respondents, it’s not right to consider the random errors of two numbers subtracted to make a victory margin as completely anticorrelated. The American Statistical Society, in their helpful pamphlet “What is a Margin of Error?” (which Mark Bluementhal links to his “What is the Sampling Error for the Exit Polls?” post on his Mystery Pollster blog) lists on page 11 the rule of thumb that the margin of error for a victory margin should be 1.7 times the published margin of error for a single number. Note that 1.7 is a halfway compromise between the square-root-of-2 (=1.4142…) factor appropriate for uncorrelated errors and the factor of 2 appropriate for perfectly anticorrelated errors.]

3) The graph was made in Microsoft Excel. Excel calculates the median discrepancy in the victory margin as 3.7%. The fact that the line on the plot labelled “Median Discrepancy: 3.7% is in fact just a tad low is an artifact due to the fact I drew the line on top of the graph with the drawing toolbar in Microsoft Excel rather than figuring out how to get Excel to plot the line in the graph itself.

Hope this helps. I apologize for any confusions caused.

December 17, 2004 at 9:29 pm

Two more points to :

1) Oops, in my last comment I confused mean and median. The median discrepancy calculated by Excel is 3.9%, not 3.7%, which is the mean discrepancy. The reason why the line on the graph looks a little high is the same, though: my inadequate skills in Microsoft Excel (the line, once again was drawn on top of the graph with the drawing toolbar, and not plotted within the graph by Excel itself).

2) As for the the tiny discrepancy you note in the New Mexico data between Dr. Freeman’s paper and my graph:

Once again, I am using Dr. Freeman’s data as of December 1st. This December 1st data is mildly revised compared to data he used earlier in November. Dr. Freeman claims these mild revisions are the result of the fact that some of the earlier data was already partially contaminated by the “renormalization” process in which the data was altered to agree with official results.

December 17, 2004 at 10:05 pm

Thanks, that answers most of my questions. I think one of the main problems is that the links provided beneath the graph lead to the sets of numbers I described. There are no links to the data as you present it?

Secondly, I am not utterly convinced of your argument for adding the confidence intervals. Unfortunately, the size of the confidence interval is the key to your argument that not much is outside the expected 95% range.

It seems that the same numerical results will essentially be had by using either the Kerry or Bush numbers since the Kerry numbers define the Bush numbers and vice versa.

martin hill

December 17, 2004 at 11:55 pm

Indeed, I would have preferred to have a link to the raw data on this blog. However, I respect Dr. Freeman’s apparent wish to keep track of who’s using his data and calculations. (To elaborate: In November, Dr. Freeman used to have a direct link to his own Excel spreadsheets which contained the raw data. Now however, he has the following notice on his Exit Poll Discrepancy page:

**************

Excel Spreadsheets of CNN “Uncalibrated” Exit Poll Data from Election Night ’04. 49 states + DC. Write to me* if you would like this — or if you previously downloaded the data that was posted here; there were a few states for which I had posted data which was already corrected. Now I have the uncalibrated data for these states.

**************

*Bill’s Note: Dr. Freeman’s e-mail for his exit poll extracurricular activities is sf@alum.mit.edu .

So if you’re curious and want to do your own analysis, please e-mail him.)

Speaking of extracurricular activities, I’d be remiss not remind all my readers that doing such poll analysis is also an extracurricular activity for myself. If you’re curious about seeing some exit poll discrepancy analysis by people who are explicitly getting paid to analyze the exit polls and other Election 2004 issues, please check out this report by the Caltech-MIT Voting Technology Project (Note that this Caltech-MIT report marks their Voting Technology Project’s revision of their earlier erroneous report on the exit polls which used almost uniformly exit poll data that was renormalized to agree with the official results. This report now includes a full analysis of the Simon/Freeman data versus the distribution of voting technologies across states and has already been briefly discussed in this post by Mr. Blumenthal on his Mystery Pollster blog).

(And if you’re curious about the raw data or other aspects of this study, please contact MIT Professor Charles Stewart at cstewart@MIT.EDU)

December 18, 2004 at 1:31 pm

A Not-Quite-Random WalkNon-blog activities will take priority today, and yet there is so much to talk about.

February 26, 2005 at 12:02 pm

I am doing research on a class project on the exit polls for 2004 and need statistics on the senate races in Arizona, Arkansas, and South Dakota preferably the last ones taken on 11/2/04. Any help would be greatly appreciated.

ctamado@csupomona.edu

August 21, 2005 at 10:21 am

Democratizing AmericaThere has been a growing fear in this country, that its citizens are having less and less influece on the role of government. There are three pillars of our democracy that elicit concern about being corruptible.

August 21, 2005 at 10:23 am

Democratizing AmericaThere has been a growing fear in this country, that its citizens are having less and less influece on the role of government. There are three pillars of our democracy that elicit concern about being corruptible.