Fun with Data: A Monte Carlo Simulation of the 2016 Presidential Election

Unless you’ve been living under a rock for the past few months, you know that tomorrow marks the end (hopefully) to a highly contentious and divisive election. You’ve probably seen several different forecasts using different statistical models including 538, Jay DeSart,  and New York Times. You might have even seen the twitter fight between Nate Silver and the wonderfully unbiased people at Huffington Post. All of these models make different assumptions and 538 even has two different models, one that incorporates historical data and one that just uses polls.

So I’ve decided to throw my hat into the prediction game, using a very simple Monte Carlo simulation.  But just like the other methods, I have to outline my assumptions:

  1. Recent Polls may not be entirely accurate, but they are at least a decent estimate of the true voting proportions in each state.
  2. Given recent polls, the results in each state are INDEPENDENT. Many of these models do not have this assumption and they use historical trends to predict blocks of states: i.e. A president who won Georgia always won South Carolina etc. I think that these trends are reflected in the polling data.
  3. Independents will not win any states. I’m actually planning to vote for Gary Johnson tomorrow, but I don’t realistically expect him, nor stein nor Mcmullin to take any states. Hope that doesn’t burst any bubbles.

So here’s the strategy. I take at most, the 4 most recent polls among likely voters- which I got from 538’s website in the state by state predictions. For the most part, this consists of polls taken after October 26th but in a few cases, like Maine’s district 1 and 3 polls I had to use the only poll available, which was from September.

For each simulation, in each state I randomly draw one of these polls, normalize the clinton and trump voting percentages  by dividing by the total proportion of voters that would vote for either major candidate and randomly draw a Bernoulli random variable with probability that Clinton wins = (Clinton %)/(Clinton % + Trump %). I do this based on my assumption that no independents will win. Then I assign the electoral votes to the winner.

I did this 100,000 times and got the following histogram of electoral vote totals for Clinton.


Here the blue line represents the number of Electoral votes required to win – 270. 59.624 % of these simulations had Hillary Clinton achieving the needed 270 votes to win while only 38.15 % of these simulations had Donald Trump achieving the needed 270 electoral votes. This leaves  2.226 % of the simulations that went to the House of Representatives for final decision, which has only happened once.

On average, Clinton received 281.8051 electoral votes and Trump received 256.1949 electoral votes. The medians electoral votes were similar to this with Clinton receiving a median of 283 and Trump receiving 255 electoral votes. The 95% confidence interval from this sampling procedure was (183-376) for Clinton and (162-355) for Trump.

It’s also worth noting, that based on these simulations, Hillary Clinton has a substantial 18.463 % chance of breaking the record for the largest share of the electoral college of 60.8%.

Anyways, If I had to place a bet based on this and other predictions, I’m taking Hillary Clinton as the winner with 283 electoral votes – aka I think it’s going to be a lot closer than most people (except 538) are predicting.

Remember: “All models are wrong, but some are useful.” Before I get berated by Huffpo like they did Nate Silver for this conservative prediction, I just want to remind everyone that this:

is also a model, based on one Monkey’s prophetic forsight.

I’m one to believe that predictions like this one and 538’s are more accurate than this monkey, but even the great Nate Silver has been way off before.

If you enjoyed this post, take a look at some of my other Adventures in Statistics.

If you enjoyed this post check out my five “Fun with Congressional Data” series, with the links and descriptions posted below.

  1. Party Majority after election in the House/Senate since 1931 and a congressional majority’s connection to changes in Real/Nominal GDP.
  2. Occupation status of congress since 1953.
  3.  Percentages of Congress and House/Senate of Democrats and Republicans since 1857.
  4.  The myth of independent representation and choices in 2016.
  5. Amount of $$ spent on elections by Incumbents vs. Challengers and it’s effect on re-election since 1974.

-Andrew G Chapple


3 thoughts on “Fun with Data: A Monte Carlo Simulation of the 2016 Presidential Election

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s