Some professional statisticians have taken a look at this data and determined that there are no significant correlations between the counting method and Clinton winning, when taking other variables into account. These other variables include location, past voting, affluence, and others.
January 20, 2008
January 14, 2008
The observation that the Diebold optical vote scanners may have a bias against Obama (and/or in favor of Clinton) is disturbing. But, before claiming fraud, we need to take a more careful look at the data. Perhaps the townships which use the scanners are generally larger – and the larger townships tend to like Clinton better? Or maybe the towns with Diebold machines are more conservative/liberal and vote differently for Clinton? Or perhaps there are other socio-economic factors which may correlate with the use of the Diebold machines? Many of these reasons have been proposed, but none seem to negate the vote counting effect.
One fairly obvious variable that has not been checked is location. The experts seem to think that the results make sense based on what they know about the geography and locations of the towns within New Hampshire. But, while I agree that the experts and pundits know that different parts of the state vote for different candidates, no one seems to care about the distribution of the actual vote counting methods within the state (which is the main issue).
Figure legend: On the left is a map showing where hand counting and machine counting is used, and on the right shows where the small, medium and big townships are located – and the locations of hand and machine counting for medium sized towns (500-800 democrat votes). (I have updated this map on Jan 17th with better data).
My first observation from looking at the left map is: “That explains it! All the towns which use Diebold machine are in the southeast of the state! If it was a patchwork of man vs. machine, then fraud would be more likely, but now I think that this is all just a location effect.”
But, then I made the map on the right and noted that the larger towns are also in the SE of the state – which agrees the previously observed strong correlation found between the size of the township and the vote counting method.
This means that when we look at just the medium sized towns – for which the Diebold pro-Clinton bias still exists, the map is now a patchwork! Thus, these mid-sized towns seem to not be grouped by their vote counting method and the Diebold bias still exists in that set of towns. So, maybe there really is some fraud there?
As you have probably heard, the NH Sec. of State will be doing a manual recount if Kucinich and Albert Howard can pay for the estimated cost. I don’t know if this is allowed by the NH-SOS, but perhaps it would be more affordable if only the mid-sized towns were counted – since that is really the only place where this potential bias is reliably detectable. The small towns and big towns are already biased in terms of their favorite candidate and biased in terms of the vote counting technique, but the medium towns seem to be missing those confounding variables.
[Finally, if you want to repeat this analysis, please feel free to use the data and code that I have used]
NOTE: on Jan 17, I updated the image with accurate counting method data, and better name assignments between Census and NH SOS towns.
This post will keep track of any work that I have done, and anyone is welcome to use it and do a better job than me.
Here is a spreadsheet with the vote totals and counting methods:
(it has been updated a few times, the earliest version had errors based upon my source. I wrote a script to parse the data from http://checkthevotes.com/. A few days ago, when that data was hosted at http://ronrox.com, this page was much simpler to parse. Now it is a bit of a mess. The reason I chose this site is because it had all the results and counting methods on one page. He grabbed his data from politico.com).
Because of discrepancies between the checkthevotes and the NH SOS website (nearly all the towns have +/- 5 vote count differences and there are/were 19 incorrect hand/Diebold misassignments), I wrote a script to parse the ugly HTML files containing the official results. I added a new worksheet to my Google spreadsheet above, and also made the output available on my box.net account (so far, just the Dems). I will post the script at some point if you want to check it.
If you want that data usable in R, just save that spreadsheet as a tab delimited file.
If you want to make a map of New Hampshire using R, you need the shapefile from the US Census: http://www.census.gov/geo/www/cob/cs2000.html#shp
Here is the R code I used generate maps and calculate a few linear models:
# install the right packages install.packages(c("maptools", "maps"), dependencies=T)) # get the vote data nhvotes <- read.delim("nh.txt", header=T) nhv2 <- data.frame( NAME=nhvotes$township, method=nhvotes$method, cliv=nhvotes$clinton/nhvotes$dem_size, dem_size=nhvotes$dem_size, obav=nhvotes$obama/nhvotes$dem_size, liberal=nhvotes$dem_size/nhvotes$rep_size) # get the map data library(maptools) nhshp <- read.shape("cs33_d00.shp") # get the lat/lon for each town centroids <- get.Pcent(nhshp) cents <- data.frame(CS33_D00_I=nhshp$att.data$CS33_D00_I, long=centroids[,1], lat=centroids[,2]) cents <- data.frame(cents, se_dist=((71+cents$long)**2 + (cents$lat-42)**2)**.5) # add the centroid lat/lon and distance from SE corner to the map data map_data <- merge(nhshp$att.data, cents, all.x=T) # convert the map names names <- paste(map_data$NAME, paste(toupper(substring(map_data$LSAD_TRANS, 1,1)), substring(map_data$LSAD_TRANS, 2), sep="")) names <- sub(names, pattern=" City", replacement="") names <- sub(names, pattern=" Township", replacement="") names <- sub(names, pattern=" Town", replacement="") map_data$NAME <- names # merge the vote info with the map data (make sure it is sorted by the same key that # the original map data is sorted by so that the subsetting works ok) tmp <- merge(map_data, nhv2, all.x=T) map_data <- tmp[sort.list(tmp$CS33_D00_I),] # make a map object mp <- Map2poly(nhshp) # make a plot - color by method plot(mp,forcefill=FALSE) plot(subset(mp, map_data$method == "Hand"), col="yellow",add=T, forcefill=F) plot(subset(mp, map_data$method == "Diebold"), col="brown",add=T, forcefill=F) # make a plot - color by distance plot(mp,forcefill=FALSE) plot(subset(mp, map_data$se_dist < 1), col="blue",add=T, forcefill=F) plot(subset(mp, map_data$se_dist >= 1 & map_data$se_dist < 2 ), col="purple",add=T, forcefill=F) plot(subset(mp, map_data$se_dist >= 2 ), col="red",add=T, forcefill=F) # color by town size plot(mp,forcefill=FALSE) plot(subset(mp, map_data$dem_size >= 800 ), col="lightblue",add=T, forcefill=F) plot(subset(mp, map_data$dem_size >= 500 & map_data$dem_size < 800 & map_data$method == "Diebold"), col="purple",add=T, forcefill=F) plot(subset(mp, map_data$dem_size >= 500 & map_data$dem_size < 800 & map_data$method == "Hand"), col="blue",add=T, forcefill=F) plot(subset(mp, map_data$dem_size < 500 ), col="grey",add=T, forcefill=F) # calculate the linear model and anova l <- lm(map_data$cliv ~ map_data$dem_size + map_data$method) anova(l) l <- lm(map_data$cliv ~ map_data$dem_size + map_data$method + map_data$se_dist + map_data$lat) anova(l) # latitude and se_dist are the most important variables in determining the method ? l <- lm(map_data$se_dist ~ map_data$method + map_data$cliv + map_data$dem_size + map_data$lat) anova(l)
This shows all the cities/towns sorted in decreasing size, and a sliding window average of the hand count vs. Diebold count. My graph shows that for all town sizes, there is a bias toward Hilary in the Diebold machine count. However, my graph also shows a sliding window average of the standard deviations… and the difference between the hand and Diebold counts is well under the standard deviation of the results. If I knew more stats I could calculate a t-test or p-value of the significance of the difference, and I bet it is not statistically significant, but it sure looks curious.
I originally posted this on reddit (Jan 10th):