VANCOUVER, British Columbia -- Mike Sullivan considers himself a skeptic by nature. And as he said after Monday's practice in Vancouver, British Columbia, the Penguins' analytics department would probably tell you the same.
That's not to say that Sullivan isn't a "believer" in analytics. There's nothing to "believe" in -- they're just numbers. It's more data quantifying what is happening on the ice, simply more information. By being a "skeptic," Sullivan means that he wants to really understand what he's getting as much as possible.
"I ask a lot of questions," he said. "I ask a lot of 'Why?' and, 'How accurate is this?' 'Where's the source?' Sometimes there's a lot of numbers that are out there from multiple different sources."
The Penguins have a fairly robust analytics department now under Kyle Dubas. There's Katerina Wu, serving as the senior data scientist. Working under her on the hockey research and development staff are analyst Cam Charron, data engineer Jacob Pavlovich, software engineer Luke Zolyak and data scientist Caleb Peña. Andy Saucier, the Penguins' director of professional personnel, serves as a conduit between the analytics staff and the rest of the staff.
Sullivan's media availability on Monday turned into a Ted Talk of sorts on his thoughts on the data, how the Penguins gather it and what they do with it. And he cleared up two common misconceptions some fans have regarding analytics.
First, there's the notion that by "analytics," we're talking about a bunch of numbers that boil down to shot attempts. Most of what's publicly available as far as data goes on sites like Natural Stat Trick or the NHL's own advanced stat pages are mainly shot attempts -- things like total shot attempts (including ones that were blocked or missed the net), unblocked shot attempts, high-danger attempts or expected goals, which assigns values to different kinds of shot attempts.
No, the members of the Penguins' analytics staff aren't just firing up Natural Stat Trick and regurgitating and interpreting those numbers. They collect data from a wide variety of sources, including some things they track manually in-house. It goes far beyond just shot attempts, and it goes far beyond those numbers the average fan or reporter uses.
"We have an internal analytics department that helps us," said Sullivan. "We also have other analytics sources that we pay for that we bring in and gives us an opportunity to look at a neutral sources. Quite honestly, after every single game, our coaching staff has a deep discussion, and we compare all of the different sources that we get. Then we track certain things manually ourselves as a coaching staff and we hold those numbers against some of the numbers with these different sources. Hopefully that acts a little bit of a check and balance against what information we get and what information we actually value, and what information we tend to discount because we simply don't think it's accurate."
The other notion Sullivan discounted is that the Penguins are making decisions straight off of these numbers. Obviously, there's a whole lot more that goes into making any kind of roster decision. Sullivan said that where analytics can help is "red flagging trends."
"What it does for our staff mostly is it forces a deeper conversation into our team," he explained. "Maybe we get a look at things, 'Hey, these numbers are suggesting this, we need to watch this when we're breaking the film down', or, 'We need to keep an eye on this.' And so our coaching staff has a deeper discussion around those things."
An example of that is the Penguins' expected goals compared to their actual goals scored. "Expected goals" is a metric that recognizes that not all scoring chances are created equal and aims to quantify the quality of those chances. It takes different factors that go into a scoring chance -- like distance, or whether it was off a rush or a rebound -- and assigns the chance a value based on the NHL-average probability that a chance of that type becomes a goal. The end result is a total number to represent the goals that would be expected on league average. Sullivan often refers to different "models" when talking about things like expected goals. There are different sources that assign different probabilities to different elements, and so come up with different end results.
In the Penguins' case, they rank third in the league in rate of expected goals (3.43/60) but 20th in rate of actual goals scored (2.95/60). That's not a matter of their shots being from a distance or bad angles, because the whole point of using expected goals is that it measures for the quality of shot attempts. If they were limited to the perimeter and not the high-danger areas, that would be reflected in an expected goals number that is lower.
This is a trend that would be "red flagged." The Penguins aren't going to be satisfied with just racking up the expected goals over the actual goals, and there's a clear trend in the discrepancy. So that forces a conversation on what to do about it.
"We talked to them about making the goalie sight lines difficult, having a net presence, creating traffic," Sullivan said last week on how to make those numbers better line up. "We can create off broken plays. Having said that, a lot of the expected goals models, part of those algorithms, they build in the quality of looks. So it should reflect in those numbers when the looks are quality."
Even with the variety of sources and things tracked in-house by teams, a lot of this data is in its infancy. That's not saying that there isn't much out there as far as the numbers go. Rather, it's that there can be too much information out there. The challenge is learning what numbers are actually useful.
"I think it's a necessary evil, what our sport is going through," Sullivan said of that learning process. "I think we're getting better at it every year. ... I think Kyle is a really progressive thinker in that regard also, just trying to figure out what's relevant. There is such a mass of information out there. You can't pay attention to all of it, because you'll confuse yourself. Or at least I'm not smart enough. But there are things that we do pay attention to that we think give us a much clearer understanding of what the true story is."
Sullivan said that at the end of the day, analytics is the "measure of the process." He often says that you can't control if the puck goes into the net, but you can control the process and make sure you're getting those high-quality chances.
"You're supposed to quantify the process," Sullivan said of the idea behind the numbers. "Whether it be creating scoring opportunities, or defending against them in every aspect -- off the rush, in-zone, on the power play on the penalty-kill, things of that nature. We're trying to better understand every different aspect of our game."
Analytics can just be another tool to better that understanding.