After several searches on the web following the terrible mass shooting on Saturday 6/11/16 at the Orlando nightclub I found that there hasn’t been much statistical analysis on the mass shooting phenomenon. Sure media sites will list the dates, rates of occurrences and the guns purchased (also whether or not they were legally purchased) but as I can find, no one has done a rigorous statistical analysis. First, let’s look at the raw data that contains all incidences of a shooting that resulted in the injury of 4 or more people. I got this from Mother Jones which contains data from all mass shooting since 1982 in the U.S. and although this contains more information about the shootings, I’ll only look at the times of the incidences since 1982 and the number of fatalities (and injuries) of each. Below I plotted a scatter plot of the number of injuries by time.

We can see that the attack in Orlando was clearly the most deadly in terms of injury. Upon immediate inspection, it doesn’t seem that these shootings have gotten more dangerous in terms of injuries over time, but that the rate of these shootings increased around day 8000 (around 2003).

The plot above shows the number of fatalities by the time since 1982. Again, we see that the fatalities have not increased drastically over time, other than a few instances. The rate of these shootings follow the same trends as the injury.

I wanted to see if there were certain times where the incidences of these shooting changed and try to connect these changes to different policy decisions or world events that occurred around these times. This is particularly interesting because a common assertion is that the rate of mass shootings has increased over the years. I thought there might be one around the assault rifle ban from 1994-2004.

This is an example of a changepoint problem which have been researched in depth over the years.

Since I want this article to be accessible to typical social media goers and statisticians alike, I am going to list all the math details at the end of the post for those who are interested. But the gist is, I modeled the risk of a mass shooting (i.e. hazard) assuming that the shootings are independent events (which I hope my colleagues think is a reasonable assumption and we don’t have copycat shooters).

Math: In short, I’m using the reversible jump MCMC approach of Green that he used to asses the coal mining data. The difference that I do here is I don’t choose what moves to do randomly in each iteration and instead adjust the rates, propose the birth of a split point and propose the death of a split point sequentially. I use the ICAR formulation here as well, but I will detail everything at the end. Anyways, I did a Gibbs sampler with a reversible jump. I ran 1,000,000 iterations of my sampler and burned in the first 500,000.

The Results

Ok so the first thing we can gather from this analysis is: Have there actually been changes in the rates of mass shootings? My thoughts were yes based on the above scatterplots, but what does the data have to say? Below is a histogram of the posterior distribution for the number of change points (for you laymen, a posterior distribution-in a nutshell- shows the likelihood of different change point values for the data, but don’t get caught up on that, let’s look at the graph.)

So we can see that from the shooting data, it appears that the data indicates that there is only one changepoint in the risk of a mass shooting occurring. In the posterior sample, there were 464,484 instances with one changepoint, 35461 with two changepoints and only  55 with three changepoints. I’ll focus on the cases with one changepoint since it clearly occurred with the highest frequency.

So now we need to ask when did the incidences of mass shootings change and how did it change (increased or decreased risk). The plot below shows the density of the changepoint in the posterior sample across time.

This is the kicker right here. You can see that the change in the rate of shootings took place in late 2011 or early 2012. There is no density before 2005. In fact, the mean of the changepoint sample is 2011.589 and the median is 2011.72 so it is pretty evident that the rate of mass shootings changed around 2011-2012. But how did they change? Was there an increase or decrease in the shooting incidences?

Before Changepoint hazard of mass shooting: 8.378138e-05

After Changepoint hazard of mass shooting: 0.001082364

The ratio (or the hazard ratio) of these two numbers is 12.91891, so the hazard of a mass shooting increased by over 1200% after 2011-2012!! This is astonishing, and frankly-scary. Check out the plots of the distribution of these hazards below, the red is post changepoint and the black is pre changepoint.

They don’t touch AT ALL! There is a huge increase in the risk of mass shootings since around 2011-2012.

But what could have caused this? Gun sales made a huge jump to record highs in 2010, then again in 2011 and once again in 2012. This trend can be seen here.

Now I’m just drawing at straws here for explanations on why we see what we do, but the trend is clear- there has been a historic and appalling increase in Mass shootings since around 2011-2012 and we can’t afford to let it be any higher.

Also surprisingly there wasn’t a change during the assault rifle ban, but that doesn’t necessarily mean that an assault rifle ban wouldn’t decrease mass shootings (or at least the deadliness of them-but that’s another statistical analysis).

If you enjoyed this post, take a look at some of my other Adventures in Statistics.

-Andrew G. Chapple

P.S. For those of you wondering about my model, here’s a good summary:

I assume that the hazard of a mass shooting is piecewise constant value exp(\lambda_j) on the disjoint intervals of 0<s_1<s_2<…<s_J<s_max = date of orlando shooting and that we observe a shooting time Y_i in one of these intervals (s_{j-1},s_{j}]. The survival function implied by this hazard and the corresponding likelihood is:

And I use the prior distributions:

Where Sigma_s is formulated via the Intrinsic Correlated Autoregression model in the following way:

If requested, I can add the coding I used for the MCMC to CRAN, and can detail the proposals and acceptance ratios I used for the MCMC.