Election Stats

January 30, 2008

How will Edwards’ supporters vote?

Filed under: edwards, florida, new hampshire, obama, south carolina — Brian @ 11:07 pm

I’ve assumed that Edwards and Obama seem to be drawing from the same pool of voters, and that he has been drawing votes away from Obama. But, maybe Edwards and Clinton are both fighting over the traditional Dems and Obama is getting a lot of Independent support? Or, in South Carolina, even though though Obama had a great day, one observation from the exit polling was that if you only looked at white voters, then Edwards would have won, with an estimated 40% of the white vote (Clinton got ~36% and Obama got ~24%). So, maybe Edwards’ voters are uncomfortable voting for Obama? Will they all vote for Hillary?

Well, we can take a look at Exit Polls to get some clues. Unfortunately, they never ask: “If you didn’t vote for your #1 candidate, whom would you vote for?” However, in New Hampshire, South Carolina and Florida, there are some tangential questions in the exit polls which relate to how they feel about the other candidates.

But, it is actually kind of hard to read the exit poll results. You have to know which rows and columns add up to 100% (or close) and which ones are a breakdown of the other. For example, here is one table from the South Carolina exit poll (which combines two questions), asking about the gender and race of the voter:

  % Total Clinton Edwards Kucinich Obama
Black male 20 17 3 0 80
Black female 35 20 2 0 78
White male 19 29 44 0 27
White female 27 42 35 0 22

The proper way convert that data table into a sentence or paragraph format is like this: Of the voters in the SC primary, 20% were black males, 35% were black females, 19% were white males and 27% were white females. Of the white males (for example), 44% voted for Edwards, 29% voted for Clinton and 27% voted for Obama; on the other hand, of the white females, 42% voted for Clinton and only 35% and 22% voted for Edwards and Obama, respectively.

Easy, right? The rows (excluding the first column) each add up to 100%. That is why we can write the sentences we wrote above: each row is a breakdown of that particular category.

However, sometime we don’t want to phrase our explanation the same way that we did above. Sometime we want to know which answers were chosen by supporters of a particular candidate. But, none of the candidate columns add up to anything useful! To demonstrate this issue, it is useful to look at a table where the rows (answers) are not evenly distributed.

Here is another exit poll question from SC: “Do you think this country is ready to elect a black president?”

  % Total Clinton Edwards Kucinich Obama
Ready 77 21 16 0 63
Not ready 22 48 29 0 23

So, 77% of voters think the country is ready to elect a black president and 22% of voters think we’re not ready. Looking at the table, you might also try to incorrectly conclude that more of Edwards’ voters think the country is not ready (29 is higher than 16, right?). Wrong. Since only 22% of voters think we are not ready, and 29% of people choosing that answer voted for Edwards, then 29% of 22% of Edwards’ voters think that we are not ready. Calculating 29% of 22% means only about 6% of all the voters. And, we know from the overall results that 18% of all the voters chose Edwards, so that means that MOST of the Edwards voters actually think we ARE ready for a Black President.

To try and make these exit poll tables more intuitive, I like to convert them to a different format. Imagine that there are exactly 100 voters, and that each number in the table shows the total number of voters with that particular candidate and answer combination (to convert the table, just multiply the percentage in the candidate column by the percentage of people choosing a particular answer). So, after you convert the normal exit poll table to a 100-person (or 100%) table, it ends up like this:

  Clinton Edwards Kucinich Obama
USA Ready for a Black President 16 12 0 49
USA Not ready for a Black President 11 6 0 5

Now this can be understood fairly easily. The 100 voters (or 100%) is distributed among all the boxes in the table. The rows add up to the total percentage choosing that answer in the poll and the columns add up to the total percentage that each candidate received. This table would be read as follows: Of all the voters in the primary, 12% voted for Edwards and also thought the US is ready for a Black President, and 6% of all voters chose Edwards but thought the US is NOT ready for a Black President (or 2/3 of the Edwards voters thought we are ready for a Black President).

(There is another way to convert these tables, which is to have each COLUMN add up to 100%, but then the rows are hard to interpret. I like this format shown here, because you can intuitively understand the rows or the columns – even though you might have to do some extra calculations in your head to get some summary percentages.)

Here are some other exit poll results, from other states and shown in the 100-person converted format which relate to how the voters feel about Clinton and Obama:

Do you think this country is ready to elect a woman president?

  Clinton Edwards Kucinich Obama
Ready 25 13 0 38
Not ready 2 6 0 15

Summary from South Carolina: A large majority (more than 2/3) of both Edwards and Obama voters think that the country is ready for a woman president. And, a majority of Edwards and Clinton voters also think the US is ready for a Black President (see above).

Do you think this country is ready to elect a black president?

  Clinton Edwards Kucinich Obama
Ready 32 9 1 30
Not ready 16 6 0 3

Do you think this country is ready to elect a female president?

  Clinton Edwards Kucinich Obama
Ready 46 8 0 26
Not ready 2 6 0 7

No matter how you voted today, how would you feel if Hillary Clinton wins the nomination:

  Clinton Edwards Kucinich Obama
Satisfied 48.8 6.4 0.8 24.0
Dissatisfied 0.6 7.6 0.2 9.8

No matter how you voted today, how would you feel if Barack Obama wins the nomination:

  Clinton Edwards Kucinich Obama
Satisfied 29.4 7.0 0.7 32.9
Dissatisfied 18.9 7.5 0.3 0.9

Summary of Florida: Edwards voters in Florida are not as willing to believe that the US is ready for a black or woman President as they were in SC, and those voters are equally split between their satisfaction and dissatisfaction over how they would feel if Clinton or Obama wins the nomination.


Is your opinion of Hillary Clinton:

  Biden Clinton Dodd Edwards Gravel Kucinich Obama Richardson
Favorable 0.0 37.7 0.0 11.1 0.0 0.7 19.2 3.0
Unfavorable 0.3 0.3 0.0 6.5 0.0 0.5 16.0 1.5

Is your opinion of Barack Obama:

  Biden Clinton Dodd Edwards Gravel Kucinich Obama Richardson
Favorable 0.0 26.0 0.0 15.1 0.0 1.7 35.3 4.2
Unfavorable 0.3 11.4 0.0 2.4 0.0 0.3 0.2 0.6

Summary of New Hampshire: 11% of NH voters voted for Edwards and have a favorable opinion of Hillary, while 6.5% voted for Edwards and have an unfavorable opinion of her (i.e. 37% of Edwards voters have an unfavorable opinion of Clinton). On the other hand, only 14% of Edwards voters have an unfavorable opinion of Obama.

All these converted questions in a Google Spreadsheet

Overall conclusion? I think that Edwards’ supporters are somewhat evenly split between Clinton and Obama, but I think Obama will gain slightly more from Edwards dropping out of the race than Clinton will – mainly due to the slightly higher unfavorable view shown by the New Hampshire voters.


January 20, 2008

Real Statisticians look at the NH primary

Filed under: diebold, new hampshire — Brian @ 9:36 pm

Some professional statisticians have taken a look at this data and determined that there are no significant correlations between the counting method and Clinton winning, when taking other variables into account. These other variables include location, past voting, affluence, and others.


January 14, 2008

Vote Counting methods, drawn on a NH map

Filed under: diebold, new hampshire — Brian @ 12:48 pm

The observation that the Diebold optical vote scanners may have a bias against Obama (and/or in favor of Clinton) is disturbing. But, before claiming fraud, we need to take a more careful look at the data. Perhaps the townships which use the scanners are generally larger – and the larger townships tend to like Clinton better? Or maybe the towns with Diebold machines are more conservative/liberal and vote differently for Clinton? Or perhaps there are other socio-economic factors which may correlate with the use of the Diebold machines? Many of these reasons have been proposed, but none seem to negate the vote counting effect.

One fairly obvious variable that has not been checked is location. The experts seem to think that the results make sense based on what they know about the geography and locations of the towns within New Hampshire. But, while I agree that the experts and pundits know that different parts of the state vote for different candidates, no one seems to care about the distribution of the actual vote counting methods within the state (which is the main issue).

NH map of vote counting methods

Figure legend: On the left is a map showing where hand counting and machine counting is used, and on the right shows where the small, medium and big townships are located – and the locations of hand and machine counting for medium sized towns (500-800 democrat votes).  (I have updated this map on Jan 17th with better data).

My first observation from looking at the left map is: “That explains it! All the towns which use Diebold machine are in the southeast of the state! If it was a patchwork of man vs. machine, then fraud would be more likely, but now I think that this is all just a location effect.”

But, then I made the map on the right and noted that the larger towns are also in the SE of the state – which agrees the previously observed strong correlation found between the size of the township and the vote counting method.

This means that when we look at just the medium sized towns – for which the Diebold pro-Clinton bias still exists, the map is now a patchwork! Thus, these mid-sized towns seem to not be grouped by their vote counting method and the Diebold bias still exists in that set of towns. So, maybe there really is some fraud there?

As you have probably heard, the NH Sec. of State will be doing a manual recount if Kucinich and Albert Howard can pay for the estimated cost. I don’t know if this is allowed by the NH-SOS, but perhaps it would be more affordable if only the mid-sized towns were counted – since that is really the only place where this potential bias is reliably detectable. The small towns and big towns are already biased in terms of their favorite candidate and biased in terms of the vote counting technique, but the medium towns seem to be missing those confounding variables.

[Finally, if you want to repeat this analysis, please feel free to use the data and code that I have used]

NOTE: on Jan 17, I updated the image with accurate counting method data, and better name assignments between Census and NH SOS towns.

Raw Data for NH Primary – DIY stats

Filed under: diebold, new hampshire — Brian @ 12:14 pm

This post will keep track of any work that I have done, and anyone is welcome to use it and do a better job than me.

Here is a spreadsheet with the vote totals and counting methods:


(it has been updated a few times, the earliest version had errors based upon my source. I wrote a script to parse the data from http://checkthevotes.com/. A few days ago, when that data was hosted at http://ronrox.com, this page was much simpler to parse. Now it is a bit of a mess. The reason I chose this site is because it had all the results and counting methods on one page. He grabbed his data from politico.com).

Because of discrepancies between the checkthevotes and the NH SOS website (nearly all the towns have +/- 5 vote count differences and there are/were 19 incorrect hand/Diebold misassignments), I wrote a script to parse the ugly HTML files containing the official results. I added a new worksheet to my Google spreadsheet above, and also made the output available on my box.net account (so far, just the Dems). I will post the script at some point if you want to check it.

If you want that data usable in R, just save that spreadsheet as a tab delimited file.
If you want to make a map of New Hampshire using R, you need the shapefile from the US Census: http://www.census.gov/geo/www/cob/cs2000.html#shp

Here is the R code I used generate maps and calculate a few linear models:

# install the right packages
install.packages(c("maptools", "maps"), dependencies=T))

# get the vote data
nhvotes <- read.delim("nh.txt", header=T)
nhv2 <- data.frame( NAME=nhvotes$township, method=nhvotes$method, cliv=nhvotes$clinton/nhvotes$dem_size, dem_size=nhvotes$dem_size, obav=nhvotes$obama/nhvotes$dem_size, liberal=nhvotes$dem_size/nhvotes$rep_size)

# get the map data
nhshp <- read.shape("cs33_d00.shp")

# get the lat/lon for each town
centroids <- get.Pcent(nhshp)
cents <- data.frame(CS33_D00_I=nhshp$att.data$CS33_D00_I, long=centroids[,1], lat=centroids[,2])
cents <- data.frame(cents, se_dist=((71+cents$long)**2 + (cents$lat-42)**2)**.5)

# add the centroid lat/lon and distance from SE corner to the map data
map_data <- merge(nhshp$att.data, cents, all.x=T)

# convert the map names
names <- paste(map_data$NAME, paste(toupper(substring(map_data$LSAD_TRANS, 1,1)), substring(map_data$LSAD_TRANS, 2), sep=""))
names <- sub(names, pattern=" City", replacement="")
names <- sub(names, pattern=" Township", replacement="")
names <- sub(names, pattern=" Town", replacement="")
map_data$NAME <- names

# merge the vote info with the map data (make sure it is sorted by the same key that
# the original map data is sorted by so that the subsetting works ok)
tmp <- merge(map_data, nhv2, all.x=T)
map_data <- tmp[sort.list(tmp$CS33_D00_I),]

# make a map object
mp <- Map2poly(nhshp)

# make a plot -  color by method
plot(subset(mp, map_data$method == "Hand"), col="yellow",add=T, forcefill=F)
plot(subset(mp, map_data$method == "Diebold"), col="brown",add=T, forcefill=F)

# make a plot - color by distance
plot(subset(mp, map_data$se_dist < 1), col="blue",add=T, forcefill=F)
plot(subset(mp, map_data$se_dist >= 1 & map_data$se_dist < 2 ), col="purple",add=T, forcefill=F)
plot(subset(mp, map_data$se_dist >= 2 ), col="red",add=T, forcefill=F)

# color by town size
plot(subset(mp, map_data$dem_size >= 800 ), col="lightblue",add=T, forcefill=F)
plot(subset(mp, map_data$dem_size >= 500 & map_data$dem_size < 800 & map_data$method == "Diebold"), col="purple",add=T, forcefill=F)
plot(subset(mp, map_data$dem_size >= 500 & map_data$dem_size < 800 & map_data$method == "Hand"), col="blue",add=T, forcefill=F)
plot(subset(mp, map_data$dem_size < 500 ), col="grey",add=T, forcefill=F)

# calculate the linear model and anova
l <- lm(map_data$cliv ~ map_data$dem_size + map_data$method)

l <- lm(map_data$cliv ~ map_data$dem_size + map_data$method + map_data$se_dist + map_data$lat)

# latitude and se_dist are the most important variables in determining the method ?
l <- lm(map_data$se_dist ~ map_data$method + map_data$cliv + map_data$dem_size + map_data$lat)

NH primary vote counts different in towns of all sizes

Filed under: diebold, new hampshire — Brian @ 9:43 am

This shows all the cities/towns sorted in decreasing size, and a sliding window average of the hand count vs. Diebold count. My graph shows that for all town sizes, there is a bias toward Hilary in the Diebold machine count. However, my graph also shows a sliding window average of the standard deviations… and the difference between the hand and Diebold counts is well under the standard deviation of the results. If I knew more stats I could calculate a t-test or p-value of the significance of the difference, and I bet it is not statistically significant, but it sure looks curious.

Average hand vs Diebold vote count by town size

I originally posted this on reddit (Jan 10th):


Blog at WordPress.com.