Quantifying Streakiness in NBA 3-Point Shooters

by Joseph Ryan Glover

In basketball parlance, a streaky shooter is a player whose shooting effectiveness is inconsistent: some games they are hitting nearly all their shots and other games they are stone cold. There’s a lot of talk around the Internet about streaky shooters and a lot of top 10 and top 15 lists are made (pity J.R. Smith). Despite the volume of listicles published I didn’t find a lot of quantitative analysis to back up what the authors feel are the league’s streakiest shooters and I aim to rectify that. In this post I lay out a method for quantifying player streakiness.

Since this is a first attempt at a new analytical method I am going to keep my focus narrow. I am going to only look at the 2015-2016 season, I am only going to look at the streakiness of 3-point shots and I am only going to look at players with at least 100 3-point attempts in the season. Lest you think this is too limiting, rest assured that even with these limits 212 players have made the cut.

Other analysts have looked at streakiness in the past and their attempts involved analyzing the standard deviation of a player’s per game shooting percentage. This makes sense if you want to study streakiness at a game level; players with a larger standard deviation have large swings in performance from game to game (i.e. are streakier). My issue with this approach is that it fails to account for longer hot and cold streaks that span multiple games. My proposed solution examines a player’s shooting performance across a season as a continuous, ordered stream of made or missed shots. This approach allows for sustained periods of hot or cold shooting to be identified and also allows me to normalize performance so that players can be compared.

Before I get into the analysis I need to comment about my interpretation of streakiness. If a player goes 7-for-8 from 3 in a game we would agree that he had a hot hand that game, that he was on a streak. That determination is independent of where in the sequence of 8 attempts the player got their miss. In other words, scoring three, missing one, and scoring four is not somehow worse than missing the first shot and then rattling off seven straight. They both result in 7-for-8. What this means is that my streakiness has some “forgiveness” built in; you can miss some shots during a hot streak and still be considered hot. Conversely, you can score one in the midst of brick city and still be considered stone cold. Streakiness therefore needs to be a historic measure that looks at a player’s recent past to determine whether they are hot-or-not.

Keeping the above caveat in mind, I propose the following the method to structure the shooting data. For each player with more than 100 3-point attempts in the 2015-2016 season, extract each of the 3-point attempts and chronologically order them from the start of the season to the end. For every make assign a 1 and for every miss assign a 0. After the player’s 10th 3-point shot of the season, start calculate a running 10-shot sum that continues to the end of the season. The running 10-shot sum is the “history” that accounts for the “forgiveness” I mentioned above and it allows an analyst to look at a player and ask “how’s he doing?” after any 3 point shot. Maybe the player is 3-for-10 over their last ten shots, maybe they’re 9-for-10. Whatever they are, this score out of 10 allows us to gauge streakiness across a season after any 3-point attempt (after the first 10).

Side note: why 10? Because humans have 10 fingers and 10 toes and we’re biologically predisposed to the decimal system? Also, no one in the data set had more than 10 3-point makes in a row in the 2015-2016 and in fact, only one person had one 10 3-points makes in a row (take a bow Austin Rivers).

 

Some Worked Examples

It is possible (and fruitful) to chart the running 10-shot sums by player. The chart below is for Aaron Gordon and the signal on the chart reveals his recent historic performance after each shot of 2015-2016 season.

A couple of things to notice:

The chart runs out to 133 but Aaron Gordon had 142 3-point attempts in 2015-2016. Yes, that’s true, but remember we lose the first 9 shots  of the season since the idea of a running 10-shot sum doesn’t make sense for a point in the season with less than 10 previous shots. Every player will have 9 fewer points on these graphs compared to their total 3-point attempts in the season.

According to his chart, Aaron Gordon seems pretty consistent in his 3-point shooting; throughout the entire season he never goes above 5-for-10 and only drops to 0-for-10 once near the end of the season. Most of the time he’s hovering at the 3-for-10 mark. Yes! Thank you for noticing that! There is a reason that I picked Aaron Gordon for this example and that’s because he is the least streaky player in the data set. He is, in other words, Mr. Consistency (more on this very important point in a bit).

Do you want to see someone who is, arguably, the opposite of Aaron Gordon? Here’s Sasha Vujacic:

A thing you might notice about Vujacic (other than that his line is orange) is that his trend is all over the place. He flirts around at 5-for-10, then plunges down for a period in the 0-for-10 area, then back up to 6-for-10, then down again, then up again (to a period of 8-for-10!) then down, then finishes the season strong. Phew, his signal is a little … volatile.

I bet you can guess why I showed you Vujacic. Yes, it’s because my metric says he is the streakiest player in the data set. He has stretches of the season where his shooting is bad and then good and then bad again. He is the poster child for inconsistency even though he had a season 3-point percentage of 36.3% (more on this later).

I think these two examples point to a truth about streakiness (and I am going to bold it for emphasis): to be streaky is to be inconsistent and the benefit of the running 10-shot sum is that gives us a new measure to gauge consistency. Because we’re not concerned about game shooting percentages we can put all players on an equal playing field (i.e. last ten shots), regardless of how many shots they take per game, and then measure a player’s performance across all eleven possibilities (i.e. 0-for-10 through 10-for-10) to get a sense of how consistent they are.

For example, if someone was to see Aaron Gordon nail a three and they turned to the coach and asked “how’s he doing?” it would not be unreasonable to say, sight unseen of the stats, “he’s 3 for his last 10”. Why? Because Gordon’s season chart indicates that he spends a lot of the season at 3-for-10. Conversely, if someone watched Vujacic nail a three and asked the same question the coach might just shrug since Vujacic could be anywhere along his trajectory, either in a downturn or an upswing, 1-for-10 or 7-for-10. Gordon, then, is the picture of consistency while Vujacic is the picture of inconsistency. Gordon is not streaky, he’s a steady 3-for-10 guy which means that steady equals consistent and hence streaky equals inconsistent.

Boiling Streakiness Down to a Number

This is all well and good but how do we measure streakiness (i.e. inconsistency) for everybody? The answer: histograms!

Looking at Aaron Gordon’s running 10-shot sum chart we (or rather Excel) can count off how many times he spends at each position. Adding them all up and making a histogram produces the following chart:

Look at that sharp, defined peak at 3-for-10! Nearly 47% of the time Gordon is in the 3-for-10 state for 3-point shots. This is exactly why the coach can guess, sight unseen, that he is 3-for-10, since he spends nearly half the time there. Of course, he spends time in other states, but they are all strongly adjacent, 2-for-10 and 4-for-10, which crisp fall offs in either direction. The man is consistent!

Now here’s Vujacic’s histogram.

Yikes! That’s a bit of a mess. It doesn’t even look like a normal curve, just a strange collection of zig zags. All we can say for sure after Vujacic nails a shot is that he is unlikely to be 10-for-10 (only the running-10-shot-sum-God Austin Rivers gets that courtesy). Vujacic is all over the place.

I know that I still haven’t showed you how to numerically quantify who is streaky and who isn’t; I just gave you another set of graphs. But seeing the histograms (especially Vujacic) helps conceptualize the next part of the discussion. I said above that Vujacic’s chart doesn’t look like a normal curve and if you squint and turn your head a bit, Gordon’s does. This is important because thinking of the histogram in terms of a probability distribution (like the normal distribution) is key to figuring out who is streaky and who isn’t. Specifically, we want to quantify who is most inconsistent and name them the streakiest player in the league (that matches the criteria).

To get to that point, let’s briefly discuss the uniform distribution:

As you can see, the uniform distribution is super exciting. It is, in mathematical terms, a straight line that doesn’t change. But, if you use your imagination and think of it in terms of a basketball player shooting 3-point shots in the 2015-2016 season you can imagine a player of perfect, unfettered chaos who is equally likely to be 0-for-10 as he is to be 10-for-10. This player is an Elder God of streakiness, every time they sink a shot it is equally likely that they are at any spot on the running 10-shot sum spectrum.

But! This chthonic nightmare has a use because we can use him to compare against every one of our players and assess how close to madness they are. This is done by summing the absolute difference between a player’s value at each position with that of the uniform distribution. I call this number the residual. What the residual ultimately quantifies is how different the player’s histogram is from the uniform distribution. The smaller the number, the closer they are to the chaos. A score of 0 would indicate a complete match.

Here’s a visual of Gordon’s comparison.

Gordon has the maximum difference from the uniform distribution because he has a sky high peak at 3-for-10 and zeroes from 6-for-10 onward. His residual score is 1.2741.

Now here’s a visual of Vujacic’s comparison.

Not for nothing does Vujacic have the streakiest signal in the data set! His signal most closely matches the madness of the uniform distribution and his residual score is 0.6554.

So finally we have a number. All we need to do is calculate this absolute difference score, sort it from low to high and find the streakiest players in the data set. Conversely, we can sort the table from high to low and find the most consistent players in the data set.

Here’s a table that lists the top-20 for both:

Some interesting results here. First off, Kawhi is streaky? According to the numbers, yes. Korver, Carter and Iguodala are tossed around as streaky shooters, fine. What the chart doesn’t show is that Draymond is number 24, Jimmy Butler is 28, Steph Curry is 29 and Harden is 31 (yes, Steph is considered slightly streakier than Harden!). And the Westbeast is not considered streaky! He’s in position 156! What’s the world coming to?

Here are Curry’s and Harden’s seasonal charts:

Both Curry and Harden have, obviously, way more attempts per season than Gordon or Vujacic and consequently they have a lot more downturns and upswings. While Curry never hits the 0-for-10 bottom during the season, Harden does it several times. However, Curry also has a lot of severe, whiplash-like changes between 1-for-10 and 7-for-10 that is ultimately accounting for his streakiness measure.

You can see that both of their histograms are much closer to Gordon’s than Vujacic’s and while they look pretty identical the residual calculation reveals that Curry (0.8326) is just slightly more inconsistent than Harden’s (0.8345). What it means is that the calculation is picking up on subtle differences that the human eye can’t pick out (thanks computers!).

Wrapping Up

I did have a notion that perhaps my methodology was biased against players with more 3-point attempts, perhaps more attempts meant more opportunities to go awry and enter the realm of streakiness. To investigate that I did a bog-standard correlation assessment between each player’s residual and the number of attempts they had in the season, I found a value of -0.126, which means they are not correlated. I also decided to test the correlation between the residual and a player’s seasonal 3-point percentage. I found a correlation of -0.374 which is a really soft negative correlation but it does suggest that players who are more inconsistent (i.e. streaky) have higher 3-point percentages, which is kind of counter intuitive (but the correlation is real small, so let’s disregard this line of thinking).

I’m honestly not sure what to make of this analysis. The reasoning seems sound to me but some of the results seem a little daffy. I really like the idea of running out the entire season of 3-point attempts as a sequence of 10-shot histories to quantify the streakiness and visual inspection of the charts seems to suggest that the players who are being flagged as streaky have swings in their performance across the season. I’ve attached the spreadsheet of data I used to produce this analysis at the bottom of this post so if anyone wants to have a look, run the analysis and poke holes in my efforts, I’d be glad to hear about the results. Have a good one.

The Excel spreadsheet with data is available here.