A few days ago, Jonathan Willis at Bleacher Report had a great article introducing his readers to a few of the most common NHL advanced
statistics. He breaks the stats up into team statistics and player statistics.
For team stats he mostly focuses on Fenwick and Corsi, which are the two most
popular way that hockey analysts like to measure shot attempts.
Fenwick and Corsi are commonly thought of as a good way to
capture how much a team is dominating offensively, since teams only attempt a
shot when they're in their offensive zone. In other words, Fenwick and Corsi
are thought of as proxies for how much time a team spends in their offensive
zone. A natural conclusion to draw from this is that if your team dominates
your opponent in the Fenwick and Corsi stats, then most of the game was likely
spent in your offensive zone and your team probably won the game.
In this post I'm going to dig into Fenwick and Corsi and
show that neither stat is very good at predicting who wins hockey games in the
NHL. In fact, I'll show that the simple shots-on-goal stat does a better job at
predicting who will win a game than Fenwick and Corsi.
Before I go any further, here's a quick crash course on what
Fenwick and Corsi are. In every NHL game, the official play-by-play records
shots on goal, as well as missed shots and blocked shots. When Fenwick and
Corsi are used as team-stats, their calculation is straight-forward. A team's
Fenwick is their number of shots on goal (including goals scored) plus missed
shots. A team's Corsi is this number plus the number of shot attempts that were
blocked by the defense.
Often, people use the difference between the two teams'
Fenwick or Corsi measure as a way to measure the amount of time each team
possessed the puck in their own offensive zone. Because they ought to capture
offensive possession time, Fenwick and Corsi differential are often assumed to be
good indicators of who wins hockey games. The rest of the post will show why
this isn't the case.
The graphs above shows the winning percentage of teams that finish
the game with more shots, higher Corsi, and higher Fenwick. These numbers (and all the stats in the post) were calculated based on the play-by-play from every
regular season game from the 2007-2008 season to the lockout-shortened 2013
season, about 6850 games in total.
As you can see, none of these stats are particularly good at
predicting who wins games, regardless of whether we look at all shot attempts
or just those that occur in 5-on-5 situation. Of the three statistics,
shot-on-goal differential is the best predictor of success.
Think of it this way. If these statistics are useful
predictors of who wins and loses games, then we should be able to do a good job
of picking the winner of a game based only on these stats. Imagine that we had
a list of 100 random NHL games, and we were provided the shot, Corsi, and
Fenwick differentials for each game, and we were supposed to guess which team
won the game.
If we flipped a coin 100 times, we'd pick the correct team
50 times. If we always picked the team that won the shot differential, we'd
pick about 53 games correct. If we always picked the team that had the higher
Fenwick, we'd pick about 51 games correct. And if we picked the team that had a
higher Corsi, we'd pick 46 games correctly; 4 worse than if we just flipped a
coin.
In other words, using
Fenwick as a predictor of who wins a hockey game is only marginally better than
flipping a coin. Predicting games based on Corsi differential is noticeably worse than just flipping a coin.
Are big gaps in Fenwick or Corsi better predictors of winning?
One possible argument to defend Fenwick and Corsi against what
I've said so far is that a lot of games end with the Fenwick and Corsi
differential being extremely close. These games may add a lot of noise to the
data, making it difficult to see the positive correlation between Fenwick/Corsi
and winning percentages. While this argument seems valid, it doesn't hold up
when you test it with the data.
The graph above is
identical to the two earlier in this post, except this time I have dropped any games
in which the team differential in the statistic of interest was less than 5.
What is interesting is that, contrary to the argument suggested above, dropping
games with close shot, Corsi, or Fenwick differentials actually makes the
statistics worse at predicting who wins the game.
The pure shot differential statistic is now predicting the
correct winner just 51 times out of 100. The difference in Fenwick -- in games
with a Fenwick differential of more than 5 -- performs worse than a coin flip,
when predicting which teams wins and which team loses. And the Corsi statistic
has become even worse. Teams that have 5 more shots, misses, and attempts
blocked than their opponent only win the game 44% of the time. The set of graphs
below illustrate why this is the case.
These three graphs show the winning percentages of teams who
had more shots, higher Corsi, or higher Fenwick. The x-axis is the value of the
shot/Corsi/Fenwick difference. The first graph, which focuses on shots on goal,
seems to show a positive relationship between shot differential and winning
percentage. Said another way, the more a
team outshoots their opponent, the higher the probability that they win the
game.
The analogous graphs for Fenwick and Corsi differences tell
a very different story. Until the Fenwick differential is more than about 25,
the relationship between Fenwick and winning percentage is flat. Teams in this
range seem to have about a 50/50 chance of winning the game. When the Fenwick
difference is greater than 25, there is not as much of clear pattern. The graph
jumps around quite a bit partly because there are very few games with Fenwick
differences this high, making it difficult to pin down the relationship between
Fenwick and winning.
The graph for Corsi tells a slightly more clear story. Until
the Corsi difference is greater than 45, there seems to be a distinct downward
trend in the data. The greater the
difference in Corsi between two teams, the more likely it is that the team with
the lower Corsi value wins the game.
Do Fenwick and Corsi become better predictors of winning as the game goes on?
The last thing I'm going to explore in this post is whether
Fenwick or Corsi become better predictors of winning as the game progresses. So
far I've only looked at winning percentages of teams based on the full game's
Corsi or Fenwick stats. It could certainly be the case that these statistics
get better at predicting who wins as the game goes on.
To test this, I look at several points throughout close games
to see how good the stats are at predicting the winner. I start 15 seconds into
the game and remove any games which were not tied at this point. Then I
calculate the shots, Fenwick, and Corsi stats based on the remaining 59:45 of
the game. I then calculate the winning percentage of the teams who won the
shots/Fenwick/Corsi battle. After that, I go back to the full dataset and
repeat the process for :30, :45, 1:00, etc. until I have the winning
percentages at every 15 second interval in regulation.
The graphs above show these winning percentages, based on
time remaining in regulation. Remember, all the numbers presented in these
graphs are based on games which were tied at that point. The takeaway from all three of these graphs is that these three
statistics get worse and worse at predicting the winner as the game goes on.
Shots on goal differential remains a positive predictor of
winning until about 10 minutes into the first period. Fenwick is a good
predictor for about five minutes, and then becomes worse than just flipping a
coin. And Corsi starts as a terrible predictor of winning, and gets worse from
there.
The last thing to note about these graphs is the big upswing
in all three at the far right. The reason all three of these become very good
predictors of winning at the end of the game is overtime. Games that are tied
with one or two minutes left in regulation almost always go to overtime. In
overtime, either team can end the game on the first shot.
When you think about it, none of my findings should as a
surprise to anybody. Fenwick and Corsi incorporate plays which, by definition, won't
help you to win games. The best consequence of a missed or blocked shot is your
team retrieving possession of the puck, putting you right back where you were
before you attempted the shot. At worst, you're conceding possession to the
other team. This doesn't mean that teams shouldn't take lots of shot attempts,
since you don't know whether an attempt will be a shot, miss, or block. It does mean that analysts should think
hard about using Corsi or Fenwick as an indication of how well a team is
playing. You're taking a stat (shots on goal) that could directly influence the
outcome of games, and adding plays (missed and blocked shots) to it which, by
definition, cannot help a team win the game since they cannot directly put
points on the board.
In the next week or so I hope to write a follow up post to
this one, where I'll suggest a couple of statistics that might be better team
statistics than Corsi or Fenwick.
Unfortunately, you fail to account for score effects. http://nhlnumbers.com/2013/12/5/score-effects-and-you
ReplyDeleteI actually do account for score effects in the section Do Fenwick and Corsi become better predictors of winning as the game goes on?. I subset the data by 15 second intervals, and only look at games that are tied. Even when I do this, my conclusions hold up. For example in games that are tied with just 3 or 4 minutes left, there's still a strong and negative correlation between Fenwick/Corsi and winning the game.
ReplyDeleteI see a major flaw in your method for accounting for score effects. You are looking at tied games, and then saying, what is the correlation between outshooting beyond a certain point and winning. However, what happens is that when a team scores a goal, score effects take hold. Here is a good primer on score effects http://nhlnumbers.com/2013/12/5/score-effects-and-you.
DeleteAnyways, so while you are seeing a negative correlation between winning beyond some certain point in a tied game and shot differential, what is actually happening is that more often than not the better team is scoring a goal to go ahead, and then they get outshot for the rest of the game because of score effects. They are still the better team, and when tied will outshoot their opponents more often than not. But because they are defending a lead, they go into a defensive shell, allowing their opponents to take advantage and outshoot them for the rest of the game. That is why you are seeing Fenwick/Corsi predicting losses rather than wins.
Interesting findings (particularly the last set of plots). I have a few thoughts which I think are too long for a comment, so I threw up my own blog. Take a look if you like: http://occurrenceanalysis.blogspot.com/2013/12/the-interrelationships-between-score.html
ReplyDeleteAm I right in looking at these graphs, a Corsi differential of 60 or more happened 5 times, and the team with the higher Corsi lost in 4 of those? Wouldn't a 60 Corsi differential mean essentially domination?
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteInteresting, Tortorella is going to love this finding.
ReplyDeleteThree months later and no follow-up as promised?
ReplyDeleteI wish I could find the time to get back to this project. Between teaching, my dissertation, and other research I haven't had the time. I will eventually though.
Delete