I have to say I really enjoyed modelling this one!
Sometimes when you’re modelling trying to maintain an edge over the bookmakers things can get awfully complicated. You want to account for so many things that you easily get carried away and lose focus on the essentials. Models tend to get overly complicated and are really no fun maintaining when they could be simple and beautiful.
This darts model is simple and beautiful!
I’ve been following the darts scene for a very long time now and I’ve always enjoyed watching it. Have been to several WCs in recent years and have also implemented a model for matchups/moneylines for my betting. I enjoy the outright betting most, so I really wanted to come up with a proper outright model, too.
So what is this model all about and how does it work? Let’s dive in.
To generate tournament predictions we first need to be able to quantify the outcome probabilities of individual matches. I’m using a blend of elo-ratings, current form and bookmaker prices to compute match estimates.
However we must account for the fact that tournaments are often played in different formats. The PDC European Tour for instance plays (most of) its matches in a ‘First-To-6’ leg format. The PDC World Championship is played in a ‘First-To-X’ set format (with 3 legs needed to win a set).
In order to account for the myriad of different formats we first have to calculate the probability of ‘player 1 wins a LEG against player 2’. If we have the leg probability we can then compute the match probability using the inverse beta-function. In Excel this would look like this
prob_match = BETA.DIST(prob_leg, format, format, TRUE)
with
prob_leg … probability of player 1 to win a leg
format … the leg-format of the match (e.g. 6 would mean a ‘First-To-6’ legs match).
The format matters A LOT! Imagine MvG playing a ‘First-To-16’ match against Keegan Brown you would imagine he’ll be a heavy favourite. If they’d play a ‘First-To-5’ match things would be a lot closer. In fact for this particular matchup MvG would be a 87% favourite to win a ‘First-To-16’ match, but would only be a 72% favourite in a ‘First-To-5’ match.
Armed with these match probabilities it’s a matter of playing the tournament millions of times. This is called a Monte Carlo simulation.
In order to determine the winner of each match we generate a random number between 0 and 1 and compare this number with the match probabilities.
Let’s walk through one example. Let’s assume Michael Van Gerwen plays William O’Connor with MvG being an 85% to win the match. If the random number generator throws a number < 0.85, MvG is considered the winner. A random number between 0.85 and 1 would mean O’Conner is the winner.
The loser is eliminated and the winner advances to the next round.
We repeat this process for every match and every round. A tournament with 64 entries has
log(64, 2) = 6
rounds until a winner is determined (log x to the base of 2).
Each simulation gives us a winner. Perhaps Michael van Gerwen wins a particular simulation. That’s one possibility – and currently it would be the most likely one – but he would certainly not win every simulation.
That’s why we need to run the simulation millions of times. I tend to run 2.5 million trials for a 64-player field as such big fields mean big variance. Over that many simulations, MvG wins sometimes (but not always), Gary Anderson wins a little less often and the no-hopers might not even win once in a million trials.
In fact Michael van Gerwen has won 350k simulations while Dyson Parody hasn’t won a single simulation in 2.5 million trials as you can see from the predictions for the 2019 Gibraltar Darts Trophy.
On a side note it takes my Apple iMac 27″ (from 2013) almost 6 hours to run all 2.5 million simulations. I love it to see my machine sweating!
The number of wins is divided by the number of simulations in order to derive the probability of one player winning the tournament. If we take the inverse we’ll get the fair odds which can then be easily compared against bookmaker prices to look for potential value bets.
fair_odds = n / number_of_wins
As you would expect the draw affects outright probabilities quite a bit. Assume Michael van Gerwen, Gary Anderson, Peter Wright, Rob Cross all being in the same half then players in the opposite half of the draw would have a much ‘easier’ path to the final as if they were in the same half.
This model accounts for both tournaments with a set draw (e.g. European Tour events) as well as for tournaments with a random draw (e.g. the UK Open).
I will try to publish the tournament predictions before every major PDC event on my twitter account @BettingIsCool
The predictions also include the value and advised stakes based on the Kelly-Criterion on a scale from 1 – 100.
I hope these forecasts serve as a supplementary tool for your betting or even just to have a side bet every now and then. Don’t go havoc with your stakes though!
And always remember the beauty of simple things!