SLAM NBA Top 100 All-Time Players: How many played each year?

I’m fired up for the finals tonight, hoping Cavs keep this series close (and JR knows the score), so I felt like writing a little about the NBA.

There’s often a lot of talk about what era of the NBA was the best. We could look at this as a function of what teams were the most dominant (which could be assessed via ELO) or how many great players played in each season. While the latter doesn’t account for players in their primes or declines, it can give us some indication of exactly how much talent was featured in each NBA season.

Slam magazine recently released their new top 100 players of all time list, which could be argued against, but at least gives some critical assessment and ranking of players since the NBA’s founding. ESPN did this a few years back, but Slam’s list was constructed more recently. You can see the list from Slam below.

SLAMIMAGE

I want to point out again that I don’t personally agree with everything on this list (Uhhhh where is Anthony Davis??) but this list does give some ranking of players from people who cover the game. I looked at the number of top 100 players on this list who played in each season from 1980-2010 below. I want to point out that I cutoff at 2010 since we are still watching many player careers unfold, which may drastically change these ranks in say 2025.

Number

We see a major increase in the number of top 100 players in the league from 1996-1999 (Yes, during Jordan’s second 3-peat), topping out with 32 top 100 players participating in the 1998-1999 season. 1981 only featured 21 of the top 100 players, the least among the years considered. The 1990’s, on average, had the most top 100 players per season, with a mean of 27.5 players. This was followed by the decade from 2000-2010 with 25.6 top 100 players per season, and the 80s last with 24.3 top 100 players per season.

Below is a plot featuring the total rank scores of the players in each season for this time period. In this case, Micheal Jordan gets a rank score of 100 since he’s ranked #1 while Shawn Kemp gets a rank score of 1.

RANKS

Again we see that the period from 1996-1999 had the highest rank sums, with the maximum rank sum value of 1556 seen in 1997. 1981 had both the lowest count of top 100 players and the lowest rank sum from 1980-2010. We see a steeper increase in the rank sum at the end of the 2000s than from the # of top 100 players, which is likely due to Stephen Curry, Kevin Durant and Russell Westbrook entering the league in consecutive seasons from 2006-2008.

Below is a table featuring the [minimum  –  mean  –  maximum] values for both the # of top 100 players per season and the sum of each players ranks per season.

Decade # of top 100 players Sum Rank
1980-1989 21  –  24.3  –  27 1053  –  1338  –  1556
1990-1999 23  –  27.5  –  32 1291  –  1475  –  1698
2000-2009 23  –  25.6  –  29 1181  –  1349  –  1574

Looking at the whole picture, according to SLAM magazine, the 1990s were clearly the most dominant, particularly as evidenced by the sum of the player ranks. The 1990s had 126 higher average sum ranks than the 2000s, which is fascinating considering that Micheal Jordan who had a rank value of 100 did not play in two of these seasons (yes, coming back for the 1995 playoffs counts…). When comparing the 1980s to the 1990s we see similar # of top 100 players and player ranks. The 2000s did have a much higher floor than the 1980s in terms of rank sum.

This is all contingent on SLAM’s ratings, but I suspect that we would see a totally different list if this was made in 2028, at which point critics would have the proper time to reflect on the decade of basketball that we are currently in.

If you enjoyed this post, check out some of my other posts!

-AGC

Keywords: Best NBA Players, Top NBA Players of All Time, Whats the best NBA Era

Advertisements

Fun with Data: Is Blake Bortles contract that bad? A look at 2018 cap hits and QBR

Blake Bortles may not have dominated the 2017-2018 NFL playoffs, but he did take the Jaguars to the AFC championship and nearly did enough to beat the Patriots. The Jaguars rewarded his play with a 3 year, 54-million dollar contract. Initially, I was aghast. At times it seemed that the Jaguars won in spite of Bortles. He threw for only 1 touchdown in the two playoff wins, with 2 interceptions and a combined 48% completion percentage. His play over the season also wasn’t spectacular.

So I wondered, how bad is this contract?

Turns out, it’s actually a pretty great contract in comparison to other teams, at least in terms of the salary cap hit for next season. I got my data on QBR from ESPN and my data on cap hits from over the cap.

Bortles is slated to cost the Jaguars $10 million dollars in cap next season, which is the 12th lowest out of the 25 qualifying players from ESPN who are currently under contract for next season. Below is a plot of these 25 players’ quarterback rating (QBR) as a function of their salary. salary (in millions).

QBR

Bortles is seen here in green and I added the least squares regression line for QBR as a function of salary. Bortles actually performed better last season then the predicted quarterback with a $10-million dollar cap hit. In case you’re wondering, Carson Wentz had the highest QBR last season, and was a bargain. I added a point for Kirk Cousins in red, reflecting his possible record breaking $30-million dollar contract. Based on his play in 2017 and the least squares regression line, he would be overpaid. This got me thinking, what are some of the best and worst QBR/Salary ratios going into next season? Below are the top 5 best ratios:

Quarterback Ratio
Dak Prescott 91.9
Brett Hundley 54.1
Jacoby Brissett 52.1
Deshone Kizer 26.1
Trevor Siemian 14.5

It’s probably not surprising that all five of these players are on rookie contracts. And here’s the bottom five ratios in the league.

Quarterback Ratio
Kirk Cousins 1.7
Cam Newton 1.7
Derek Carr 1.9
Eli Manning 2.0
Joe Flacco 2.2

Assuming that Kirk Cousins is paid $30-million dollars, he would have the lowest QBR/Salary ratio in the league. The rest of this list is also probably not surprising as they received massive long term contracts but played poorly as of late.

I hope you enjoyed this post and if so, check out some of my other blog posts!

-Andrew G Chapple

Keywords: Is Blake Bortles Good, Worst NFL Contracts, Best NFL Contracts, Best NFL QB Contracts, Worst NFL QB Contracts, Best Quarterback Contracts, Best Quarterback in NFL

A look at rank and point differentials between LSU and Alabama in the Saban era

Today marks the 12th time that LSU has played Alabama since Nick Saban took over as head coach, only winning 3 out of the past 11 matchups with 6 losses in a row. Over these 11 games, Alabama scored an average of 7.2 more points per game than LSU, with a victory margin of 11.5 points in the 8 wins against LSU.

I went back and looked at the rankings of each team in these games, found on these team/year pages from ESPN, (It really didn’t take but 10 min to tabulate), to see if there were any interesting trends. In only four of the 11 games, LSU was ranked better than Alabama, with LSU winning the first two and losing the second two outings. On average, Alabama has been ranked 4.2 spots higher than LSU.

Below is a plot of the Alabama – LSU point differentials as a function of the rank differentials, where each black point represents a game played against Saban. If Alabama won, the point differential is positive and if they had a higher ranking, the rank differential is negative. For example, this year Alabama is ranked #2 and LSU is ranked #19 so the rank differential is 2-19=-17, which is the biggest differential since Saban took over. The three large triangles represent the three overtime games and the purple dot is the predicted point differential for today’s game.

BAMA2

The two black lines here mark 0’s so that points to the right of the vertical line indicate that LSU was ranked higher, while points below the horizontal line indicate a victory for LSU. I fit a linear regression model to these data points to predict the point differential of today’s game based on the rank differential and this model estimates that Alabama will win by 12.9 points. The regression line is plotted above in purple and has intercept 5.30 and slope -.44.

For lagniappe, I fit a logistic regression model for the rank data to get an estimated probability of victory for LSU. ESPN says that LSU has a 6.9 % chance of winning, while the usual logistic regression model on the rank data only gives LSU a 1.2 % chance of winning. Neither suggests a positive outcome for the Tigers, but upsets do happen.

Who knows, an upset here might be karma repaid for the loss to Troy, but what a strange season that would be…

Geaux Tigers!!

If you enjoyed this post, check out some of my other posts!

-Andrew G. Chapple

Keywords: Nick Saban vs LSU, Best SEC Teams, Alabama vs LSU, LSU vs Alabama, SEC Football History, College Football History, Les Miles Record, Nick Saban Record, Nick Saban record vs LSU

 

 

 

 

 

Adding common legends to multiple R plots

Recently I submitted a paper to a journal that required compound figures with multiple plots to be uploaded as one combined single .eps file. This challenge turned out to be daunting, as I was trying to combine 4 figures, each requiring a legend. Doing this in R using par(mfrow=c(2,2)) produced plots that were unreadable due to the shrinked sizes of each plot and the legend any one of the plots. Likewise there were not too many options in terms of combining the plots .eps files into one .eps file online.

The solution: xpd=”NA” in the legend options.

The following code was used to add a common legend to the

> legend(-.35,1.5, legend = c("0", "1","2","3"),
+ lty = c(2,3,4,5),
+ title = "Subgroup",xpd="NA")

It seems like most of the time, that this code produces an error or warning, but does what we want, producing the following graphic.

FINAL

So we have the desired output, one common legend in the middle of four plots. Now the coordinates here for the legend were tried via trial and error. For some reason, different sets of graphics have different coordinates that correspond to (0,0) so you need to move the legend around until you get what you want. The following two graphics shows some of the trial and errors that it took to get the legend placed correctly.

It usually doesn’t take too long to get it placed right.

I hope this is useful! And if not, it is at least here as a reference for me.

If you liked this post, check out some of my other blog posts.

-Andrew G. Chapple

Keywords: Multiple Plots in R, Legends for multiple plots, Plotting in R, How to plot several plots in R, Using the legend function for R plots

Who should win the 2016-2017 NBA MVP?

By now I’m sure you’ve seen a case for the four big candidates: Russel Westbrook, James Harden, Lebron James and Kawhi Leonard or maybe just a breakdown of who should win. With the voting process ending today at 3PM eastern, I’m going to throw my own hat into the who-should-win sweepstakes.

Even though I reside in Houston, I’m going to try my best to be unbiased here and just look at the numbers.

First I’m going to look at something most analysts haven’t, statistics versus playoff teams. How well do these candidates play against the top competition? I downloaded the data from BasketBallReference.com and looked at the average statistics in those games against playoff teams, each player played at least 37 of these games.

PLAYOFF STAT

This table should immediately disqualify James Harden based on his negative Real +/- statistics in these games, which measures how the point margin will change over 100 possessions when the player is in the game versus when they aren’t. It incorporates the teams defensive efficiency when that player is on the floor. When Harden was on the floor against playoff teams the average margin was -.3 points less than when he was off the floor. All other MVP candidates have a positive +/- stat against Playoff teams. Obviously since these are means, they are heavily influenced by especially poor performances but the medians tell the same story for James: Harden = 0, Leonard = 1, Lebron =2, Westbrook = 3. Harden’s defensive liability hurts him here. Westbrook had 6 poor games where he had a +/- of > -19.

Now that Harden is out, Lebron also gets the boot after having a losing record 12-15 since the All-Star break. You cannot collapse down the stretch like that.

Now we’re down to Westbrook versus Leonard.

As far as playoff teams, Leonard has a higher win percentage, a higher +/- mean (but not median), a higher average FG % and a lower turnover margin. Westbrook leads Leonard in the three major statistical categories by a pretty large margin in these games.

Obviously we know that Westbrook had the better season in terms of raw statistics throughout the season, averaging a triple double, leading the NBA in scoring and breaking the record for the most triple doubles in a season. It has been mentioned that his teammates helped Westbrook get these records. Leonard is the league’s best two way player on the second best team in the NBA while Westbrook’s Thunder finished 10th overall. But who’s more valuable to their team?

In games against playoff teams, when Westbrook had a +/- that was <0, they won only 5.26 % of those games compared to Leonard’s Spurs winning 31.25 % of their games. When Westbrook played awful and had a +/- < -10, the Thunder did not beat any playoff team (which happened 9 times) where the Spurs won once out of the three times Leonard played this poorly. Looking at the entire season, the Spurs won 34.78% of their games when Leonard had a (-) +/- while the Thunder won 13%.

This indicates that Westbrook’s play is much more important to the Thunder’s success. When he plays poorly, the Thunder win less than when the Spurs when Leonard plays poorly. It’s also worth mentioning that the Spurs won 5/6 games they played without Leonard this year. I’d venture to say that the Spurs are still a playoff team if Leonard is not on that team (you know they’d be better than the Trailblazers) but the Thunder are bottom feeders if Westbrook had left with Durant.

So for me the MVP is Russell Westbrook.

If you enjoyed this post, check out some of my other blogs!

-Andrew G Chapple

Keywords: Who won 2016-2017 NBA MVP, Russell Westbrook MVP,  Russell Westbrook MVP Statistics, 2016-2017 NBA MVP, Thunder MVP, Best player on the Thunder

 

 

 

 

Package SCRSELECT: Using ReturnModel()

Here you can find more details about how to use the SCRSELECT package, particularly the main function ReturnModel(). I will post a more advanced tutorial for users that want to manually do SCRSELECT() or DICTAUG() later here.

The link to the package documentation can be found on CRAN here. This code is related to a recent paper: “Bayesian variable selection for a semi-competing  risks model with three hazards”  which is joint work with Dr. Marina Vannucci, Dr. Peter Thall and Dr. Steven Lin.

This function performs Bayesian variable selection on a semi-competing risks model which gives the posterior probabilities of inclusion for each variable in each hazard model and determines the optimal model based on thresholds determined by the DIC statistic. More details on this can be found in the paper, which will appear in print in August.

The package has 4 main functions:

  • SCRSELECT – Performs Stochastic Search Variable Selection (SVSS) on the semi-competing risks model and saves posterior samples to a given location and summaries of the variable selection parameters and MCMC acceptance rates.
  •  DICTAUG – takes posterior means from SCRSELECT and performs a grid search to find the optimal thresholds for each hazard variable inclusion.
  •  ReturnModel- Uses SCRSELECT and DICTAUG to determine the best final model, then this performs the final MCMC to obtain inference.
  •  SCRSELECTRUN – Used in the ReturnModel function

The most useful function here is ReturnModel() as it performs SVSS, finds the optimal final model (i.e. what variables are included) and then returns important summaries of the posterior along with writing the posterior to a desired folder location. I’ll briefly go over the inputs prior to looking at the output and explaining it.

  • ReturnModel(Y1,I1,Y2,I2,X,hyper,inc,c,BSVSS,BDIC,Path)
    • Y1: Vector of effusion or death time.
    • I1: Vector of indicators for a non-terminal event.
    • Y2: Vector of death or censoring times.
    • I2: Vector of death indicators.
    • X: Covariates ~ the last inc columns will not be selected and will be a part of all three hazards for any possible model.
    • Hyper: Large vector of hyperparameters ~ see documentation.
    • Inc: How many covariates should be excluded from selection.
    • c: shrinkage hyperparameter, values between 10-50 are suggested.
    • BSVSS: how many iterations to run the SVSS (suggested = 100,000)
    • BDIC: how many iterations to run the DIC search (suggested = 10,000)
    • Path = where to save posterior samples of final model

I will show what the output from the application in the paper looks like in the program. I used the example code given on CRAN, but ran these chains out for a more reasonable amount of time. I let BSVSS = 10,000 and BDIC  = 1,000 indicating that the variable selection MCMC should be carried out for 10,000 iterations and the DIC-tau_g procedure should be run on 1,000 iterations.  I will set inc=2, meaning that the SVSS will always include the last two variables in each hazard.

RETURN1

First this program runs two different chains through the SVSS procedure and save the relevant results. Progress of each of the two chains is printed every 5000 iterations. After each chain is finished, the important posterior quantities of the combined sample are saved and the marginal posterior probabilities of inclusion are printed for each hazard. Given that this example data is just simulated noise, it’s not surprising here that we don’t see some variables being more important than others. After these values are printed, the DIC-\tau_g proceeds by first searching over (.05,\tau_2,\tau_3) for the optimal DIC. This proceeds by increasing \tau_1 by .05.

RETURN2

Once the grid search finishes, the program outputs the DIC value for the optimal model(s). It prints more than 1 value if two tau vectors produced DIC values that were within 1 of each other. Next the final model variable inclusion is printed where a TRUE indicates that that variable is important in that specific hazard. We can see, for example, that only variable 4 is included in the first hazard.

Next the program takes this final model and performs MCMC to obtain important posterior quantities for inference. The first thing that is printed is the posterior probability that a covariate increases a hazard, which for example, we see that variable 7 (which was not allowed to be selected out) had a posterior probability of a non-terminal event of .73. Likewise, Variable 2 in the hazard of death after a non-terminal event was shown to be prognostic for death (P=.83). The last bit of output from the function is shown below (excuse the output format, I’ll fix that up!):

RETURN3

We are left with the posterior median hazard ratios in the first line of each hazard, followed by the .025 % and .975 % quantiles, respectively. Once again, notice that variables not included in the final model have NA listed in their posterior summaries.

I hope this helps clear up how to use the package! As a note of practical usage, I would suggest that the user pick a value of c and see what those marginal posterior probabilities of inclusion are after the first step. You want to encourage separation so a small value of c that leads to very large marginal posterior probabilities of inclusion will be as unhelpful as large values of c which will lead to very small marginal posterior probabilities of inclusion. Ideally, you want to see some variables with high inclusion probabilities and some with low in each of the three hazards.

I also want to note that while I do not have guides here for the functions SCRSELECT() and DICTAUG(), these are pieces of the ReturnModel() function. An experienced user of ReturnModel() might want to further investigate the posterior distribution of the SVSS procedure, which you would use SCRSELECT() to do as this saves all posterior samples. Then they might save the important quantities for manually using DICTAUG(), however this is not suggested to those who are not already experienced with using ReturnModel().

If you have any questions don’t hesitate to ask! Also if you enjoyed this post, check out some of my other blog posts.

Happy Coding!

-Andrew G Chapple

Audio Slides: Bayesian Variable Selection for a semi-competing risks model with three hazard functions

This is the link for a 5-minute video summary of my latest paper: Bayesian Variable Selection for a semi-competing risks model with three hazard functions.

http://audioslides.elsevier.com//ViewerLarge.aspx?source=1&doi=10.1016/j.csda.2017.03.002

You can find the link to the paper below (once it’s uploaded, it will be sometime soon).

If you enjoyed this check out some of my other posts!

-Andrew G Chapple