Machine learning in AFL Part I - Does the free kick differential really matter?

Apr 25, 2018 8 min read

Introduction
What is machine learning?
Example
Where else can machine learning be used in AFL?

Introduction

Ever wondered how google knows what a picture of a cat is? Or how it can translate English into Chinese effortlessly? Or how Netflix can somehow predict what show you would like to see? Or how a self-driving car works? The answer is A.I and Machine Learning.

Machine learning has become the hype topic for CEOs all around the world. Many organisations have implemented machine learning based solutions into their business operation with huge success. For some, Machine learning is the ‘unicorn solution’ or ‘golden key’ to unlocking hidden potential in the data that is collected. But can such a powerful technology be used in sports and, in particular, the AFL? What do individual clubs, and the wider AFL body, stand to gain by using such a technology?

In this article, I’ll present a basic introduction to what machine learning is and why it is so powerful and a little toy example of how I inferred that free-kick differential has nothing to do with your chances of winning (statistically). Then, I’ll present some more real life examples of how the AFL Clubs and wider AFL body can effectively use machine learning to create positive outcomes.

What is machine learning?

At some point in the last decade, the word “statistics” became unsexy so it was rebranded to “machine learning”. At its core, machine learning is nothing more than using data to infer patterns, a.k.a statistics. “Machine” refers to the computer and “learning” refers to the act of uncovering a pattern from the data. Often people say that Machine Learning is ‘A.I’ (artificial intelligence) and while they aren’t wrong, they aren’t necessarily right. Machine learning is a type of A.I, but it is far from the completely sentient, terminator, Skynet robots taking over the world and destroying humanity. It is merely statistics.

Finding patterns in data is really useful, because once you’ve found a pattern, you can use this pattern to infer things about whatever that is generating that data. One of these things is predicting what will happen for a given situation, and this is why people love machine learning.

Simply put, you can use past data to either make inferences about current situations or predict the future based on events leading up to a situation.

Example

Let’s use a very simple example of machine learning in the AFL: On average, does “favourable” umpiring effect the outcome of matches?

To obtain data, all match statistics from 2003 onwards was downloaded from AFL tables.

Then, for every game, I calculated the following statistics:

Difference in contested possesions
Difference in kicks
Difference in handballs
Difference in Uncontested possessions
Difference in marks inside 50s
Difference in Clearances
Difference in Tackles
Difference in 1 percenters
Difference in Marks
Home ground advantage factor
Difference in bounces
Difference in Clangers
Difference in Free kicks
Difference in Rebound 50s

And also whether the home team won or lost:

Then I used an unsupervised machine learning algorithm called Principle Component Analysis (PCA) to make the computer learn patterns in the data. What’s really important here is that I fed the computer all of the statistics, except for the outcome of the match, which will be important later.

The way PCA works is that the computer separates out the data into various “clusters” or groups based on what it sees. As such, instead of each match being described by a group of statistics (handballs, kicks, inside50s etc.), it will be described by a series of numbers corresponding to the various clusters it has worked out (0.7 for cluster 1, 0.2 for cluster 2, 0.3 for cluster 3). The idea here is that each cluster will all be statistically independent from one another and each represent some hidden property in the data. After doing this, it is my job to try to interpret what each cluster represents.

Now, PCA is done in such a manner than the first component will be the “strongest” and represent most of the variability in the data, the second component the second strongest and so forth, so really, we only have to look at maybe the first two or three clusters to get an accurate representation of what is going on. As such, we often refer to PCA as a type of “dimensional reduction” since it takes many statistics (handballs, kicks, inside50’s etc.), does some calculations, and spits out a smaller set of variables that, more or less, contain the same statistical explanatory power as the original set.

To understand what the interpretation of the first two principal components were, I plotted each game on an x-y plot, where the coordinates of each game are the values of the PCA1 and PCA2 groups respectively. I then coloured each point whether the home-team won ( blue) or lost (pink). Remember, I didn’t give the computer the outcome of the match. You can find the graph here

What we can see then is there is a strong separation into winning and losing on the x-axis (PC1) and almost none on the second axis (PC2). What the computer has done is, all by itself, figured out a variable that corresponds to winning or losing (PC1) and not winning or losing (PC2). So PC1 corresponds to winning and losing while PC2 corresponds to something else, but it definitely isn’t winning or losing, which is what we care about.

Okay, so PC1 and PC2 are great, and we know what PC1 means, but how does this relate back to our original question?? To add to our visualisation, we can overaly a plot of each variable as they are projected onto the PC1 and PC2 plan and represent them as arrows. We can do this because, by construction, each PC variable is simply a linear combination of all of the original ones. This is show here.

Basically, the way to interpret this is that, the more the arrows corresponding to a particular statistic is pointing along the x-axis, the more it is associated with winning or losing. Sweet! So PC1 is most associated with Marks inside 50, contested possessions, clearences, hitouts and kicks, while PC2 is mostly associated with one-percenters, free kick differenctial, and clangers. In otherwords, highly contested, high possesion, high penetration football is condusive to winning, while clangers and one percenters have very little influence on winning. And, as much as everyone expected, frees-against has as much to do with winning as which round you play (almost none).

Where else can machine learning be used in AFL?

Prediction in sport has a wide variety of uses. One of these is using statistics to try and predict the outcome of a match.This might be great for a punter, but it’s not that useful at a club or institution level. Here are some other great uses of machine learning that could put into practice at the AFL with great outcomes for all.

Predicting injury

As we know, hamstring injuries are very common amongst AFL players. Machine Learning could identify the causative risk factors that lead to a hamstring injuries helping to prevent them. The data you could use might include peak running load, impact during game, cumulative running/impact load throughout the week, sleep quality, diet and other sports performance related indicators. You could then look at the historical incidence of hamstring injuries and try to pinpoint the pattern that leads to the precise causative factors leading to injury. Once you know these factors, you could pre-emptively rest or manage the player.

Predicting talent

ML is a very powerful tool to predict likely talent. At the combine, young prospective draftees are put through a rigorous set of tests in order to rank them in the draft, so a lot of performance data is collected. We also know that historically the best players in the game don’t always rank number 1 in the draft. Recruitment managers at the club level could use ML and data from the combine to predict candidate’s likely future performance. This way, they could target someone who may not necessarily draft high, but who might have the necessary attributes to turn into a superstar, hence getting a much better deal. To be honest, I would be surprised if talent recruiters aren’t doing this already. But if they aren’t, or are using only basic rules of thumb, then they should definitely use Machine Learning.

Assessing player worth

Along a similar vain to predicting talent, ML can be used during trade week. Actually, you would have to combine ML here with your own judgement, but essentially, you use ML to help assess the worth of the player being traded out of traded in. Will Motlop deliver 4 years of good service, or will his body break down and render him with only 10 good games? Is Jake Lever really worth a billion dollars per year? This approach will help you determine how much you would be willing to pay for each player since you would know how likely they are to perform. If you were really fancy, you could combine this with a prediction of what someone would be willing to pay so you could set the asking price. I think this actually strays into game theory, so we won’t discuss it here, but it could certainly be done with a little more effort.

In-game strategy

This is basically a variant of what I just described, except way more detailed and with way better statistics. The idea is basically breakdown each team into their various components and figure out ways to de-rail them. Are they more susceptible to man-on-man play? Do they relay too much on rebound from the half-back line? The idea is that while experts and footy analysts would obviously be super adept at assessing this, machine learning would give you an objective, unbias opinion based purely on statistical inference. It would then be used as another tool for the coaching staff.