### Daring Players to Shoot

#### by Joseph Ryan Glover

In basketball you often hear about a defender “daring” an offensive player to shoot. This dare can be identified by the distance the defender gives a shooter to take their shot. The reasoning is that if a defender does not consider the offensive player to be much of a shooting threat they will play off them, giving the shooter space to launch a shot, confident that it will be a miss. In the following analysis I use SportVu data from the 2014-15 NBA season (the last full year of SportVu data available) to look at every three point shot taken and the distance to the nearest defender for each shot. By plotting the three point percentage versus average defender distance for every player with at least 82 three attempts I reveal the truth behind “daring to shoot”.

To begin, I needed every three point shot from the 2014-15 NBA season. I downloaded the data from NBAsavant.com using their shot tracker web tool. There seems to be a download limit of 50,000 records so I had to download two separate batches to get all of the 2014-15 season, which had 55,116 three point attempts. By selecting a zone of “three pointer”, defining two game dates that break the season into shorter chunks and by grouping the data by both player and date, I was able to obtain CSV files for the entire season. The CSV file I created is available here.

With the data in hand I then opted to use Python to manipulate the file. I used Python for this part of the analysis to get some practice with Pandas and specifically Pandas pivot tables. The following gist details each of the steps I went through to condition and prepare the data for analysis.

The script opens the CSV file for three point shots and creates two Pandas pivot tables. The first pivots player name against the shots_made_flag which is either a 1 for a made shot or a 0 for a miss. By both summing and counting the shots_made_flag I can get both the number of made shots and the number of attempts, this is the code on line 9. The dataframe produced by the pivot has lousy column labels so the code on line 12 renames them. Also, for the subsequent 3 point % calculation I explicitly converted the columns to numeric types on line 15. Finally, I calculated the 3 point % by using a simple function and caliing it on the dataframe using the apply() method on line 21.

On line 25 I created a second Pandas pivot table, this one pivoting on name and calculating the average defender distance for each three point attempt. Notice that I didn’t specify the aggregating function for this pivot table like I did for the first one and that’s because the default pivot aggregation is mean.

On line 28 I use the Pandas concat() function to fuse the two dataframes together. Since both are based on player name this has the delightful effect of joining the two pivots by player.

I mentioned above that I only wanted to look at players with at least 82 three point attempts on the season. While this may not directly translate to 1 attempt per regular season game I needed some cut off to throw out the low end and this seemed like a reasonable number. Line 31 of the code filters out those with less than 82 attempts.

Line 34 dumps the parsed and conditioned dataframe back out to CSV as I’m going to use R for the visualization (I know, I know, bad form to mix, but I want to practice both Python and R).

For the second part of the analysis I wrote this short R snippet to create a scatter plot of the data:

This code creates a scatter plot of 3 point % versus the mean defender distance for every three point shooter with more than 82 attempts on the season.

The plot has points for everyone of the 228 players that meet the criteria as well as a linear model (the blue line) with a 95% interval (the shaded region) created from the data points. What the chart tells us is … a whole lot of nothing.

The mean average defender distance across all players in the data set is 6.04 feet and the linear model is practically a horizontal line through the 6 foot line. Players with exceptional 3 point percentages like Luke Babbit @ 51.8% on 114 shots are given an average distance of 7.67 feet but then so is Lance Stephenson @ 17.1% on 105 shots given 6.04 feet. Kobe shot 29.3% on 184 attempts but defenders were in his face at 3.98 feet and Durant @ 40.3% on 159 shots was also crowded at 4.86 feet. Looking at the linear model with its R-squared of 0.00002646 indicates that there is essentially no relationship between these two variables. In other words, a player’s 3 point %, a strong indicator of their three point prowess and reputation, does not influence how closely a defender guards them when they are in position to launch a three. What these specific numbers, and the chart overall, seem to suggest is that “daring to shoot” is not really a thing.