# Interesting Statistics Papers: Hidden Markov Models in Tennis

I’ve been taking a computational biology class where we use Hidden Markov Models to predict things like whether a part of a DNA sequence is a promoter, coding region, polya or non-coding (intergenic) region. You can read a little more about HMMs here. They  also have big applications in finance but the paper I’m discussing here is more related to the biological models.

For the DNA the basic idea is for each of the 4 states= promoter, coding region, polya, intergene they can each emit nucleotides = A,T,G,C. So the HMM takes a DNA sequence ( for example x=AACGTAGCC…) and for each letter tries to predict if it’s a promoter, coding region, polya or intergene region.

I’ve been thinking about how these types models can be applied to sports. For example, can we tell if a team is on a hot streak or slump based on game play or game to game play?

http://xm2vtsdb.ee.surrey.ac.uk/CVSSP/Publications/papers/kolonias-sspr04.pdf

By Kolonias et al. uses hidden markov models to predict who will receive a tennis point based on events that happen within each round. The big idea is what moves can a tennis player make in terms of returning balls or serving that will increase (decrease) their (their opponents) probability of scoring.

They do this by making 5 major states within their HMM: 1st serve, second serve, ace, rally and point scored where the first 4 each have sub markov models.

The main transitions are determined by the rules of tennis as follows. After a serve (1st or second), you can either fault and serve again, score on the serve (ace), have the serve returned to start a rally, or double fault and get a point scored against you. From the first serve state you can transition to any of the other states – 2nd serve, Ace, Rally, Point while from the second serve you can’t transition back to the first serve. Ace and Rally states are always ended by a point scored.

The real wow factor of this model comes in the sub markov models.

Let’s take a look at the graphic for the Rally sub model shown in the paper:

The ball can rally back and forth until an unforced error is made or a point is forced. Surely this could be complicated by adding direction of the shot or how hard they hit it back.

I’m interested in how this type of model works in predicting whether or not a point is scored in basketball. For example how many passes is too many and how many will lead to a high scoring probability. How effective are ball screens followed by a pass? How much does different amounts of spacing effect scoring in several common plays (give and go, hero ball etc)

These are questions that could be answered using this type of model.

Anyways, I thought this was a neat idea and it was nice to see Hidden Markov Models outside of The human genome.

If you liked this check out some of my other posts!

-Andrew G Chapple