“Innocent until proven guilty” is a common saying in modern criminal trials, but the concept is centuries old. According to Wiki (I know right, shame on me for citing Wikipedia) the Romans were the first to use this concept in the 6th century with the saying “Proof lies on him who asserts, not on him who denies”. This concept of prior innocence and who bears the burden of proof fit well within the statistical paradigm in several facets. Firstly, let’s think about the null hypothesis this implies in any trial. We have:
H_o: Person/Entity is Innocent
H_a: Person/Entity is guilty
Because we assume that the person is innocent until they are PROVEN to be guilty. This is a good time to bring up one of the common misconceptions when students are first learning statistics. When a student is asked to perform a hypothesis test on an examination, say H_o: \mu=20 vs. H_a: \mu \ne 20, a common (incorrect) answer is “We fail to reject Ho so we conclude that \mu=20”. But we don’t know for certain that \mu=20, we just know that based on the evidence, there is not enough to support a conclusion that \mu \ne 20 with a high probability. Likewise, just because a trial concludes that the defendant is innocent, that doesn’t imply that they were actually innocent! It just says that the evidence collected and the manner it was presented to that selected jury and judge was not convincing enough that the person was deemed guilty.
When we test H_o: \mu=20, we might use a t-test if we make assumptions about normality of the underlying variable. But in this analogy, let’s think about the test only as a “yes/no” answer by presetting our critical value and rejecting \mu=20 only if the absolute value of our test statistic is greater than our critical values.
The test in criminal trials also provides a “yes/no” answer to H_o: Person/Entity is Innocent by proxy of the judge’s or the jury’s unanimous decision. While the statistical test \mu=20 is based on statistical theory, the criminal test is based on the evidence, the characteristics of selected jury (or judge), the performance of the prosecution, the performance of the defense, and now-a-days the television, internet and social media coverage (though this information is tried to be withheld from Jury members). I will call the last case what it is- Bias (akin to statistical bias). While the judicial system tries their best to avoid jury members obtaining these biases, in high profile cases like the O.J. Simpson murder case (People vs. O.J. was a great watch if you haven’t yet!) it’s not possible to keep all information away from jurors and likely many already had formed an opinion on the matter from news coverage. Likewise, jury selection can also be a bias in cases where the defendant’s ethnicity, sexual orientation or gender is not represented in the jury.
Let’s talk about \alpha A.K.A. type I error. While we may never know for sure if O.J. committed the crime, letting a guilty man walk is better than letting an innocent man go to jail (Disclaimer: This is not my perspective on the case). I’ve read quite a few stories about people who were wrongfully convicted and released over 20 years after new evidence was discovered or the real culprit was found. Some states provide compensation for your time if you are wrongly convicted, but the amount is not nearly enough for spending 20 years of your life behind bars. In my eyes, this is worse than letting a guilty person walk free because they might at least face the punishment of lifelong guilt for their action while the innocent person would face punishment for something they didn’t do. So if you remember your stat classes in college, we want to set \alpha to be really small. That is, we want the probability of deciding a person is guilty when they are really innocent to be very small! \alpha=.05 will not cut it here or you would have many innocent people in jail. Think about it, not all type I errors are alike! So we probably want to set \alpha=.001 or something even smaller.
If you’re familiar with statistical tests, you know that with a smaller \alpha it becomes harder to reject Ho in favor of Ha when Ha is in fact true. So while we want to keep innocent people from jail at all costs, inevitably this will let more guilty people walk free. In testing, we often talk about power, which is the probability of rejecting Ho in favor of Ha when Ha is true. We also talk about tests that maximize power under many or just one alternative. If we want to maximize the power of a criminal trial, we need to collect better evidence, select unbiased (or low bias) juries and have prosecutors that can effectively communicate the evidence to the jury since power here corresponds to the probability of declaring a person guilty when they really are. With the advent of DNA testing, the power of these trials have gone up (we’re not dealing with super-spy criminals who can frame people like in the movies). Likewise, prevalence of video camera recordings, cell phone and credit card records have also increased the power of the criminal trial process over time. This has also decreased the type I error in these trials because we are less likely to convict an innocent man if say his DNA was not at the crime scene or he wasn’t seen on video camera at the scene.
I remember thinking about this during the Casey Anthony verdict in high school (who has recent allegations of admitting it to her lawyer) and other recent non-guilty verdicts that garnered so much backlash from the public. As I read comment after comment about their guilt in social media I thought about how a guilty person could get away with it but after reading about the innocence project (which uses DNA evidence to exonerate innocent people who were previously convicted questionably) I changed my stance. You can see some of the statistics on their website about their exonerations here:
As you can see, 342 people have been exonerated since 1989 and there is a disproportional slant here towards African Americans. This further points to the need to remove racial biases from jury members and society in general. Furthermore, the average time served was 14 years!!! Could you imagine spending 14 years of your life behind bars? I tried and realized that a de-facto hypothesis testing framework is absolutely necessary and if anything, the burden of proof needs to be elevated. Sadly, there’s no clear way to use the beauty of statistics in a formal fashion in these trials except for in exposing the biases in certain trials. For example, a professor at LSU told our class about how he showed that for a certain parish in Louisiana, where 40% of residents were African American, an all-white 12 person jury could not have been randomly selected from the population (using hypothesis testing). In this sense, he showed that the jury in a previously convicted African American could not have given a fair verdict. Unfortunately, he said this argument didn’t work in the appeals and he believed the man was innocent.
What I want to leave you with is the following sensibility for our justice system. Just like in statistics, we will NEVER know the truth (unless God tells me that the number of times you get “dropped on” by a bird in your lifetime is Poisson distributed with mean 2 ). All we can do is try to use the evidence, lawyers and jury (the data and testing procedure) to make the most likely decision while controlling the extreme cases where an innocent person goes to jail and a guilty person walks free (even more so in the first case). But as in statistics, we can’t always control the power of our tests, so our best bet is to control the type I error: the probability that an innocent person is declared guilty. This is why we say defendants are:
“ H_o: Innocent until H_a: Proven Guilty”
-Andrew G. Chapple