I’m a big fan of FiveThirtyEight’s weekly puzzle, the Riddler, which has an interesting mix of puzzles with a mathematical flavor. This week’s puzzle asks you to pick the best starting score for a tennis game against a much stronger opponent, and it has a very neat solutions with iterative calculations (circular references) in Excel.
Here’s the full puzzle:
Your wish has been granted, and you get to play tennis against Roger Federer in his prime in the Wimbledon final. You have only a 1 percent chance to win each point, but Roger, sporting gentleman that he is, offers to let you name any score and begin the match at that point. (So, if you’ve entertained a fantasy of storming back after being down three match points in the fifth set, now’s the time to live it.) What score can you name that gives you the best chance to win, and what is your chance of winning the title?
Now, if you know tennis scoring, there are really only a few potential candidates for where you’d want to start – but with some help from Excel, it’s not hard to work out the odds from any starting score.
Start by looking at a game. You need to score 4 points and be ahead by at least 2 points to win (ignore the weird naming convention). If P(a,b) is your probability of winning when you’re on a points and your opponent is on b points, then as long as the game hasn’t ended, we have P(a,b) = 1% * P(a+1,b) + 99% * P(a,b+1). Filling in the win conditions (in pink) and this formula (in blue) gives you most of the grid below. The last step is to note that P(3,3) = P(2,2), since in each case the players are tied, and the first one to pull ahead by 2 points wins. If you add this in as the green formula with iterative calculations on, you get the odds of winning a game from any starting point:
[Note: I’ve set the number formatting to be a % as long as it’s above 1% and scientific notation below that, since some of the odds are going to get VERY tough. You can use this custom number format code to get this result: [<0.01]0.00E+00;[>=0.01]0.00% ]
You can do the same thing to figure out who will win a tiebreaker game (first to 7 points and a 2-point lead):
And once you know the odds of winning a game and a tie-breaker, you can work out the odds of winning a normal set (i.e. one with a tie-breaker) – just replace the 1% in purple cell with the 1.50 x 10^-7 odds of winning a game, and link the 7.67 x 10^-12 odds of winning a tie-breaker in the orange cell.
You can also work out the odds of winning a set without a tie-breaker, which works just like a game (first to 6 games and a 2-game lead):
And with that, you can put it all together to work out the odds of winning a match. This time, the purple cell has the odds of winning a (normal) set, 2.35 x 10^-39, and the orange cell has the odds of winning a fifth set (i.e. one without a tie-breaker) – although the two are almost identical.
So the odds of beating Roger Federer in a full match are ~10^115 against – ouch!
As an aside, this definitely understates your odds of winning in real life – the second post from the wonderful What If blog (if you haven’t read it, you should) determines that the odds of guessing every question right on the SAT is ~4 x 10^-111, and concludes:
This means that the odds of acing the SAT by guessing are worse than the odds of every living ex-President and every member of the main cast of Firefly all being independently struck by lightning … on the same day.
…and the odds for SAT guessing are over 25,000 times better than the odds of winning a match according to these assumptions. Expect a lot more black swans than that in real life!
Anyway, back in ridiculous-theory land we want to know the odds starting from any score, so let’s work that out too…
Given a starting score (# of sets, games in current set, and points in current game for each player), we can easily look up the odds of the player winning the current game (in either the game table or the tie-breaker table above, depending on the set / game score). Then, the probability of them winning the current set will be
P(set) = P(game) * P(set|P1 wins current game) + (1 – P(game)) * P(set|P2 wins current game)
We’ve already got P(game), and we can look up the probability of winning the set in the relevant set table above. We can then do the same thing to go from the odds of winning the set to the odds of winning the match.
Finally, we need to list out all possible starting scores which have different odds of winning (we don’t need to list 40-40 and 30-30 in the same game, for example, because as explained above they have the same odds of victory). I leave constructing this list as an exercise for the eager reader (you can check your answer against my list in the Excel file attached, although I can’t guarantee I got it all right myself!).
Once we have the list, we add in the probability of winning the current game, set, and match using the formulas above, then sort by descending probability of winning the match. The result starts off like this:
So, the best odds of winning come if we start off 2 sets up, 6 games each in the current set, and 6 points ahead in the tie-breaker, and the odds of victory are just below 6%. This makes sense, as if we win any of the next 6 points, we win the match. The odds in the next row, which is the same except that Roger has another set in hand, are almost identical – that’s because the odds of winning a set are so low (~10^-40) that if we don’t win the tie-breaker, we’re toast either way.
What if the split for any point was a more balanced 60 / 40? You’d expect to stand a reasonable chance of winning, but actually your odds are pretty dire – 26.4% chance to win a game, 3.4% to win a set, and just 0.4% chance to win a match. The chart below shows the odds of winning each type as a function of the probability of winning each point:
This model would be a lot more realistic if the odds of victory changed according to who was serving (a quick google suggests that between equally matched players, the one serving has about a 60-65% chance of winning the point), but I need to go and do other things now…
If you’d like to take a closer look at / correct likely errors in / expand my model, you can download my spreadsheet at the link below. Note that it doesn’t include the data table to create the chart, because the combination of the data table and the iterative calculations was making it run a bit too slowly…