Election Stats

October 21, 2008

Election night prediction tool

Filed under: voting — Brian @ 10:36 pm

I am currently updating a tool I wrote which can help to make projections on election night.  I used it during the primaries and it simply takes the current vote tallies per county, along with the “% precincts counted” and then extrapolates each county individually to arrive at the totals for the whole state.  This can be super helpful in cases where a big city may only have 10% of the votes counted whereas the rest of the state might already be 90% done – in which case the overall numbers will have a big skew away from the city voting patterns.

Anyway, subscribe to my feed and I’ll post a link here when it is ready.  And, obviously it will only be useful for a couple hours on Nov 4th – less than 20 days away!


October 18, 2008

Great poll tracking sites

Filed under: polls, voting — Brian @ 10:45 pm

I’m sure that all of my tiny handful of readers have already found these sites…. but these are some great site which really carefully track the many state-by-state and overall polls.  They have a pro-Obama bias in their writing/blog posts, but they do seem to be open minded and fair when it comes to interpreting the polling data.  The problem right now is that they are seem to be arguing about how to get the most accurate numbers based on all these polls, based on how well each polling method/company did during the primaries or previous elections… BUT, we are still 3 weeks away and even if the polls were 100% accurate they still won’t predict what will happen on election day since who knows what could happen between now and then.  Stock market jumps to 14,000?  Terror? major gaffe?




Even though Obama has got momentum and a serious advantage, I think it is going to be very close.

February 20, 2008

WA state primary prediction: Obama will keep his lead

Filed under: primary — Brian @ 11:00 am

About 60-80% of the ballots in the WA State Democrat primary have been counted, and using those numbers, along with the already counted results in each county, I have extrapolated the final vote counts in all the counties for Clinton and Obama.  Right now, Obama is winning with 51.6% vs. 48.4% for Clinton (ignoring all other candidates on the ballot).  After the final votes are counted, I am predicting that Obama will slightly widen his lead (mostly because King County still has more remaining ballots than most other counties) with a final fraction of 51.8% vs 48.2%.

Another way to look at this is that of the ~500,000 Democrat ballots counted, Obama got 16,000 more than Clinton.  After counting the remaining ~200,000 ballots, Obama should widen his lead to 25,000.

Here is a web page showing the data: http://www.distantconstellation.com/elections/WA/wa_pred.cgi

This is not a trivial or expected result.  For example, it could have been that the strong Obama counties had finished counting their ballots, and most of the remaining pro-Hillary counties still had a lot of counting to do, and the close results could have been swapped by the time the counting is finished.  That’s why the media folks have not predicted the results yet.  Another bias is that since this election had many absentees (postmarked – not received – by Tues, Feb 19), then the later absentees could have a different preference than the early absentees.

January 30, 2008

How will Edwards’ supporters vote?

Filed under: edwards, florida, new hampshire, obama, south carolina — Brian @ 11:07 pm

I’ve assumed that Edwards and Obama seem to be drawing from the same pool of voters, and that he has been drawing votes away from Obama. But, maybe Edwards and Clinton are both fighting over the traditional Dems and Obama is getting a lot of Independent support? Or, in South Carolina, even though though Obama had a great day, one observation from the exit polling was that if you only looked at white voters, then Edwards would have won, with an estimated 40% of the white vote (Clinton got ~36% and Obama got ~24%). So, maybe Edwards’ voters are uncomfortable voting for Obama? Will they all vote for Hillary?

Well, we can take a look at Exit Polls to get some clues. Unfortunately, they never ask: “If you didn’t vote for your #1 candidate, whom would you vote for?” However, in New Hampshire, South Carolina and Florida, there are some tangential questions in the exit polls which relate to how they feel about the other candidates.

But, it is actually kind of hard to read the exit poll results. You have to know which rows and columns add up to 100% (or close) and which ones are a breakdown of the other. For example, here is one table from the South Carolina exit poll (which combines two questions), asking about the gender and race of the voter:

  % Total Clinton Edwards Kucinich Obama
Black male 20 17 3 0 80
Black female 35 20 2 0 78
White male 19 29 44 0 27
White female 27 42 35 0 22

The proper way convert that data table into a sentence or paragraph format is like this: Of the voters in the SC primary, 20% were black males, 35% were black females, 19% were white males and 27% were white females. Of the white males (for example), 44% voted for Edwards, 29% voted for Clinton and 27% voted for Obama; on the other hand, of the white females, 42% voted for Clinton and only 35% and 22% voted for Edwards and Obama, respectively.

Easy, right? The rows (excluding the first column) each add up to 100%. That is why we can write the sentences we wrote above: each row is a breakdown of that particular category.

However, sometime we don’t want to phrase our explanation the same way that we did above. Sometime we want to know which answers were chosen by supporters of a particular candidate. But, none of the candidate columns add up to anything useful! To demonstrate this issue, it is useful to look at a table where the rows (answers) are not evenly distributed.

Here is another exit poll question from SC: “Do you think this country is ready to elect a black president?”

  % Total Clinton Edwards Kucinich Obama
Ready 77 21 16 0 63
Not ready 22 48 29 0 23

So, 77% of voters think the country is ready to elect a black president and 22% of voters think we’re not ready. Looking at the table, you might also try to incorrectly conclude that more of Edwards’ voters think the country is not ready (29 is higher than 16, right?). Wrong. Since only 22% of voters think we are not ready, and 29% of people choosing that answer voted for Edwards, then 29% of 22% of Edwards’ voters think that we are not ready. Calculating 29% of 22% means only about 6% of all the voters. And, we know from the overall results that 18% of all the voters chose Edwards, so that means that MOST of the Edwards voters actually think we ARE ready for a Black President.

To try and make these exit poll tables more intuitive, I like to convert them to a different format. Imagine that there are exactly 100 voters, and that each number in the table shows the total number of voters with that particular candidate and answer combination (to convert the table, just multiply the percentage in the candidate column by the percentage of people choosing a particular answer). So, after you convert the normal exit poll table to a 100-person (or 100%) table, it ends up like this:

  Clinton Edwards Kucinich Obama
USA Ready for a Black President 16 12 0 49
USA Not ready for a Black President 11 6 0 5

Now this can be understood fairly easily. The 100 voters (or 100%) is distributed among all the boxes in the table. The rows add up to the total percentage choosing that answer in the poll and the columns add up to the total percentage that each candidate received. This table would be read as follows: Of all the voters in the primary, 12% voted for Edwards and also thought the US is ready for a Black President, and 6% of all voters chose Edwards but thought the US is NOT ready for a Black President (or 2/3 of the Edwards voters thought we are ready for a Black President).

(There is another way to convert these tables, which is to have each COLUMN add up to 100%, but then the rows are hard to interpret. I like this format shown here, because you can intuitively understand the rows or the columns – even though you might have to do some extra calculations in your head to get some summary percentages.)

Here are some other exit poll results, from other states and shown in the 100-person converted format which relate to how the voters feel about Clinton and Obama:

Do you think this country is ready to elect a woman president?

  Clinton Edwards Kucinich Obama
Ready 25 13 0 38
Not ready 2 6 0 15

Summary from South Carolina: A large majority (more than 2/3) of both Edwards and Obama voters think that the country is ready for a woman president. And, a majority of Edwards and Clinton voters also think the US is ready for a Black President (see above).

Do you think this country is ready to elect a black president?

  Clinton Edwards Kucinich Obama
Ready 32 9 1 30
Not ready 16 6 0 3

Do you think this country is ready to elect a female president?

  Clinton Edwards Kucinich Obama
Ready 46 8 0 26
Not ready 2 6 0 7

No matter how you voted today, how would you feel if Hillary Clinton wins the nomination:

  Clinton Edwards Kucinich Obama
Satisfied 48.8 6.4 0.8 24.0
Dissatisfied 0.6 7.6 0.2 9.8

No matter how you voted today, how would you feel if Barack Obama wins the nomination:

  Clinton Edwards Kucinich Obama
Satisfied 29.4 7.0 0.7 32.9
Dissatisfied 18.9 7.5 0.3 0.9

Summary of Florida: Edwards voters in Florida are not as willing to believe that the US is ready for a black or woman President as they were in SC, and those voters are equally split between their satisfaction and dissatisfaction over how they would feel if Clinton or Obama wins the nomination.


Is your opinion of Hillary Clinton:

  Biden Clinton Dodd Edwards Gravel Kucinich Obama Richardson
Favorable 0.0 37.7 0.0 11.1 0.0 0.7 19.2 3.0
Unfavorable 0.3 0.3 0.0 6.5 0.0 0.5 16.0 1.5

Is your opinion of Barack Obama:

  Biden Clinton Dodd Edwards Gravel Kucinich Obama Richardson
Favorable 0.0 26.0 0.0 15.1 0.0 1.7 35.3 4.2
Unfavorable 0.3 11.4 0.0 2.4 0.0 0.3 0.2 0.6

Summary of New Hampshire: 11% of NH voters voted for Edwards and have a favorable opinion of Hillary, while 6.5% voted for Edwards and have an unfavorable opinion of her (i.e. 37% of Edwards voters have an unfavorable opinion of Clinton). On the other hand, only 14% of Edwards voters have an unfavorable opinion of Obama.

All these converted questions in a Google Spreadsheet

Overall conclusion? I think that Edwards’ supporters are somewhat evenly split between Clinton and Obama, but I think Obama will gain slightly more from Edwards dropping out of the race than Clinton will – mainly due to the slightly higher unfavorable view shown by the New Hampshire voters.

January 29, 2008

Florida Primary – preparing to follow results

Filed under: florida — Brian @ 10:51 am

Wow! The Florida SOS has set up a really great website to report the results for today’s primary. Interestingly, of their 67 counties, 14 use touch-screen voting machines. Also, in addition to absentee and polling place voting, they have early-voting where you can go to a nearby govt office and cast your vote up to 2 weeks before today.

They even have a way to just download the data in a tab-delimited file (see link on left hand side of their page)! Luckily, they don’t care if you fill out the form with a GET or POST, so here is a direct link (you can use wget, too):


County election offices.

I made a script to calculate the predicted totals, based on differing voting patterns in different counties.

One thing I always like to do here at home in Washington State is get the early returns and use the differences in county voting patterns to extrapolate the totals for the whole state (based on estimates of turnout). This is usually interesting in WA because we have a large chunk of people who vote absentee, but the law here only requires the ballots to be MAILED by election day, so we end up waiting a few days to see how the election ends up. In a close race with lots of absentees, we have a lot of elections dragging out for days.

In Florida, there are also a lot of absentee ballots, but they are required to have them ARRIVE by election day (I think). So, perhaps the first reports on the Florida website will have all the absentees in one batch. This makes it easier to see if there are differences in voting patterns between the absentees and the poll voters.

January 27, 2008

Super Tuesday states and the race of their voters

Filed under: race, super tuesday — Brian @ 10:33 pm

Super Tuesday is coming up. Based on the differing support that the candidates received based on the voters race (in South Carolina in particular), I though it would be interesting to see the racial makeup of the states in Super Tuesday. For states where exit polls exist in the 2004 Primary for Democrats, I reported those values. I also went to the Census to find the fraction of the population which is African American in all the upcoming Super Tuesday states.

Also, here’s a striking map from the US Census showing where African Americans make up larger fractions of the population – on a county-by-county basis.

In another note, it looks like Latino voters are not too fond of Obama. In Nevada (the only state so far with appreciable numbers of Latinos), he got his lowest support from them. This could be bad for him in those states with high numbers of Latinos (CA, AZ, NY).  An interesting article from the Washington Post goes into more details about the importance of the Latino vote in California.

January 26, 2008

Obama’s SC win did not need a large black turnout

Filed under: obama, race, south carolina — Brian @ 9:18 pm

Obama would have won South Carolina even if only 18% of the voters were black.

SC Dem Primary (2008) by hypotheical % African American Turnout

I used these pieces of data of estimate the candidates’ results had the African American turnout been much lower:

With 99% reporting:

  • Obama had 55% (295,091)
  • Clinton had 27% (141,128)
  • Edwards had 18% (93,552)

And Exit Poll Data:

  • 55% of Democrat primary voters were African American
    • breakdown: Obama:78%, Clinton:19%, Edwards:2%
  • 45% were white or other
    • breakdown: Obama:24%, Clinton:36%, Edwards:40%

This leads to an estimate of 529,771 total voters (for the top 3), of which 291,374 were cast by African Americans and 238,397 were cast by whites (and others).

If we assume the total number of white voters is constant and reduce the black turnout (but keeping the same candidate distribution), then we can estimate what the results would have been had the black voter turnout been much lower.

In fact, the turnout could have been as low as 18%, and Obama would have still won!

It would be sad to see the Clintons or media spin or suggest Obama’s good showing in South Carolina as solely due to the very strong African American turnout or the unique demographics in SC. Even in states with an average number of African Americans (and the same candidate breakdown*), Obama would still have won. Given that a vast majority of African Americans vote for democrats, and about half the US voters are democrats, and overall about 12% of the population is African American, then 18%-20% is a decent estimate for the total fraction of Democrats voting in a democrat primary.

*(Note: this assumption that there will be the same candidate breakdown in future states is of course not true, since Edwards’ support will not likely get any better than it was in his home state of South Carolina.  The big question is: does Edwards take away votes from Clinton or Obama?)

Interestingly, Clinton was the most race-neutral candidate. She had steady support (20% of black voters and 36% of white voters), while Edwards had very strong white support (40% of white voters) and almost no black support (2% of black voters). Thus, if the black turnout was only 18%, then it would have been a virtual dead heat with all candidates getting about 33% (but Obama getting slightly more). Also, Clinton would not have won at any of the black-turnout levels.

In the 2004 Democrat Primaries in South Carolina, 47% of the voters were black, and in this primary, 55% of the voters were black. So, even with average black turnout in South Carolina, the vote would have still favored Obama by a very large margin.

January 20, 2008

Real Statisticians look at the NH primary

Filed under: diebold, new hampshire — Brian @ 9:36 pm

Some professional statisticians have taken a look at this data and determined that there are no significant correlations between the counting method and Clinton winning, when taking other variables into account. These other variables include location, past voting, affluence, and others.


January 14, 2008

Vote Counting methods, drawn on a NH map

Filed under: diebold, new hampshire — Brian @ 12:48 pm

The observation that the Diebold optical vote scanners may have a bias against Obama (and/or in favor of Clinton) is disturbing. But, before claiming fraud, we need to take a more careful look at the data. Perhaps the townships which use the scanners are generally larger – and the larger townships tend to like Clinton better? Or maybe the towns with Diebold machines are more conservative/liberal and vote differently for Clinton? Or perhaps there are other socio-economic factors which may correlate with the use of the Diebold machines? Many of these reasons have been proposed, but none seem to negate the vote counting effect.

One fairly obvious variable that has not been checked is location. The experts seem to think that the results make sense based on what they know about the geography and locations of the towns within New Hampshire. But, while I agree that the experts and pundits know that different parts of the state vote for different candidates, no one seems to care about the distribution of the actual vote counting methods within the state (which is the main issue).

NH map of vote counting methods

Figure legend: On the left is a map showing where hand counting and machine counting is used, and on the right shows where the small, medium and big townships are located – and the locations of hand and machine counting for medium sized towns (500-800 democrat votes).  (I have updated this map on Jan 17th with better data).

My first observation from looking at the left map is: “That explains it! All the towns which use Diebold machine are in the southeast of the state! If it was a patchwork of man vs. machine, then fraud would be more likely, but now I think that this is all just a location effect.”

But, then I made the map on the right and noted that the larger towns are also in the SE of the state – which agrees the previously observed strong correlation found between the size of the township and the vote counting method.

This means that when we look at just the medium sized towns – for which the Diebold pro-Clinton bias still exists, the map is now a patchwork! Thus, these mid-sized towns seem to not be grouped by their vote counting method and the Diebold bias still exists in that set of towns. So, maybe there really is some fraud there?

As you have probably heard, the NH Sec. of State will be doing a manual recount if Kucinich and Albert Howard can pay for the estimated cost. I don’t know if this is allowed by the NH-SOS, but perhaps it would be more affordable if only the mid-sized towns were counted – since that is really the only place where this potential bias is reliably detectable. The small towns and big towns are already biased in terms of their favorite candidate and biased in terms of the vote counting technique, but the medium towns seem to be missing those confounding variables.

[Finally, if you want to repeat this analysis, please feel free to use the data and code that I have used]

NOTE: on Jan 17, I updated the image with accurate counting method data, and better name assignments between Census and NH SOS towns.

Raw Data for NH Primary – DIY stats

Filed under: diebold, new hampshire — Brian @ 12:14 pm

This post will keep track of any work that I have done, and anyone is welcome to use it and do a better job than me.

Here is a spreadsheet with the vote totals and counting methods:


(it has been updated a few times, the earliest version had errors based upon my source. I wrote a script to parse the data from http://checkthevotes.com/. A few days ago, when that data was hosted at http://ronrox.com, this page was much simpler to parse. Now it is a bit of a mess. The reason I chose this site is because it had all the results and counting methods on one page. He grabbed his data from politico.com).

Because of discrepancies between the checkthevotes and the NH SOS website (nearly all the towns have +/- 5 vote count differences and there are/were 19 incorrect hand/Diebold misassignments), I wrote a script to parse the ugly HTML files containing the official results. I added a new worksheet to my Google spreadsheet above, and also made the output available on my box.net account (so far, just the Dems). I will post the script at some point if you want to check it.

If you want that data usable in R, just save that spreadsheet as a tab delimited file.
If you want to make a map of New Hampshire using R, you need the shapefile from the US Census: http://www.census.gov/geo/www/cob/cs2000.html#shp

Here is the R code I used generate maps and calculate a few linear models:

# install the right packages
install.packages(c("maptools", "maps"), dependencies=T))

# get the vote data
nhvotes <- read.delim("nh.txt", header=T)
nhv2 <- data.frame( NAME=nhvotes$township, method=nhvotes$method, cliv=nhvotes$clinton/nhvotes$dem_size, dem_size=nhvotes$dem_size, obav=nhvotes$obama/nhvotes$dem_size, liberal=nhvotes$dem_size/nhvotes$rep_size)

# get the map data
nhshp <- read.shape("cs33_d00.shp")

# get the lat/lon for each town
centroids <- get.Pcent(nhshp)
cents <- data.frame(CS33_D00_I=nhshp$att.data$CS33_D00_I, long=centroids[,1], lat=centroids[,2])
cents <- data.frame(cents, se_dist=((71+cents$long)**2 + (cents$lat-42)**2)**.5)

# add the centroid lat/lon and distance from SE corner to the map data
map_data <- merge(nhshp$att.data, cents, all.x=T)

# convert the map names
names <- paste(map_data$NAME, paste(toupper(substring(map_data$LSAD_TRANS, 1,1)), substring(map_data$LSAD_TRANS, 2), sep=""))
names <- sub(names, pattern=" City", replacement="")
names <- sub(names, pattern=" Township", replacement="")
names <- sub(names, pattern=" Town", replacement="")
map_data$NAME <- names

# merge the vote info with the map data (make sure it is sorted by the same key that
# the original map data is sorted by so that the subsetting works ok)
tmp <- merge(map_data, nhv2, all.x=T)
map_data <- tmp[sort.list(tmp$CS33_D00_I),]

# make a map object
mp <- Map2poly(nhshp)

# make a plot -  color by method
plot(subset(mp, map_data$method == "Hand"), col="yellow",add=T, forcefill=F)
plot(subset(mp, map_data$method == "Diebold"), col="brown",add=T, forcefill=F)

# make a plot - color by distance
plot(subset(mp, map_data$se_dist < 1), col="blue",add=T, forcefill=F)
plot(subset(mp, map_data$se_dist >= 1 & map_data$se_dist < 2 ), col="purple",add=T, forcefill=F)
plot(subset(mp, map_data$se_dist >= 2 ), col="red",add=T, forcefill=F)

# color by town size
plot(subset(mp, map_data$dem_size >= 800 ), col="lightblue",add=T, forcefill=F)
plot(subset(mp, map_data$dem_size >= 500 & map_data$dem_size < 800 & map_data$method == "Diebold"), col="purple",add=T, forcefill=F)
plot(subset(mp, map_data$dem_size >= 500 & map_data$dem_size < 800 & map_data$method == "Hand"), col="blue",add=T, forcefill=F)
plot(subset(mp, map_data$dem_size < 500 ), col="grey",add=T, forcefill=F)

# calculate the linear model and anova
l <- lm(map_data$cliv ~ map_data$dem_size + map_data$method)

l <- lm(map_data$cliv ~ map_data$dem_size + map_data$method + map_data$se_dist + map_data$lat)

# latitude and se_dist are the most important variables in determining the method ?
l <- lm(map_data$se_dist ~ map_data$method + map_data$cliv + map_data$dem_size + map_data$lat)
Older Posts »

Create a free website or blog at WordPress.com.