Using years of Tour data to pick a winning break
An ever-present feature of a Tour de France stage is the breakaway: a small group of cyclists working incredibly hard to stay clear of the main peloton, attempting to reach the final kilometres of the stage with their advantage intact and a stage win in sight. And yet, breaks are caught more often than not, with the effects of wind resistance and strength-in-numbers giving the peloton a massive advantage.
Over the last seven years, in roughly 140 stages, just 39 breakaways have succeeded, and with almost 1,500 riders involved in a breakaway during this time, each individual has an average probability of just 2.6% of winning a stage — just over one in 50. Considering the Tour de France is only 21 stages long, and some of those stages (i.e. the time trials) give no opportunity for a break, why bother at all?
Other than the obvious – trying to win the stage – there are also other advantages to getting into a break. You might want to gain publicity for your team sponsor, be well-placed near the front of the race to help a team leader later in the stage, or earn intermediate sprint or king of the mountain points for the minor jersey competitions. But it is unquestionably of interest to know which breaks to try and get into, and which have no chance of succeeding, in order to build a strategy that beats the odds.
Data provided by Peter Northall and James Baker of Frontier Economics investigates a number of properties of breakaways that might influence whether one wins a stage:
– Stage number: By later stages of the Tour, riders that have fallen out of contention for the overall standings might instead target stage wins.
– Terrain: Sprinters’ teams target stage wins on flat stages, whereas on mountain stages, the focus is instead on gaining time in the general classification. These uphill stages are also slower, which reduces the benefits of the peloton drafting in a big group.
– Length of the stage, and when the break forms.
– Size and composition of the break: A larger break creates more of a shelter from the wind, reducing the peloton’s advantage.
– Whether the stage is after a rest day.
It is useful to have a look at how these factors have affected breakaway success in recent years. The terrain has a significant impact: there has only been one successful break on a flat stage since 2013. This was stage 19 in 2017, which was placed in between seven successive mountain stages and before a race-deciding time trial. As a result it may have been treated as a bit of a rest day by the teams chasing the yellow jersey. In contrast, breakaways won over half of high mountain stages, and almost half of medium mountain stages.
The length of a stage and how many kilometres the break takes to form seem to have less of an impact, with no real pattern to when breaks succeed (the green dots in the diagram below). Most breaks lead for almost the entire stage (possibly due to the definition used, which is the first clear-cut breakaway of the day) – especially in longer stages. The exception to this is a cluster of very late breakaways in shorter, flatter stages (circled below) – these all have the same profile and may have been doomed because the sprinters’ teams refused to concede enough control to let a strong breakaway get a hold.
The number of riders in a break has an obvious effect, with larger breaks normally succeeding (there were 28 breaks comprising more than 20 riders, and the break succeeded in 86% of them). When considering the strength of the riders in the break, there are two main criteria considered in the Frontier Economics data – how many Grand Tour stage winners are present, and how good they are, measured by how many stages they have each won. The second criteria is biased towards sprinters (who win stages in bunch sprints, rather than getting involved in breakaways), but regardless, the composition of the break does not have much of an impact – especially compared to the influence of the size of the break.
As seen below, the exception to this is in small breaks, which tend to only succeed when full of talent (i.e. close to the dashed line):
Finally, the break tends to succeed much more than usual after a rest day — seven out of 14 such stages were won by the break. However, this may just be because a high percentage (50%) of these were high mountain stages, in which the break tends to do better.
So what are the most important factors?
Because the data is binary (either the break succeeds, or not), we can use logistic regression to model it, giving a percentage that the break succeeds in any given stage. The coefficients of each property in the model will tell us what is important when determining if a break is successful or not – the larger the coefficient, the more that property affects the odds of the break succeeding.
Interestingly, the only two properties of any significance that come out of the model are (1) the terrain and (2) the number of riders in the break, with terrain being twice as important as size.
So when deciding which break to join, there mightn’t be as much to consider as conventional wisdom would suggest. The strength of the other riders in the break, how much of the race is left, how long the stage is overall – these can have an affect, but not a significant one, and not nearly as much as the terrain or size of the break. The modelling also suggests that the fate of the break really lies in the peloton’s legs, rather than the breakaway’s – the peloton has the power to decide whether or not to bring the break back, based on the wider tactical considerations of each team.
To get a bit more detail, it would have been better to use data that includes these considerations, as well as more detailed profiles of each stage – for example, the current GC standings, the amount of climbing in the stage, and whether it is a summit finish or not.
Despite these limitations, what does the model say about this year’s Tour?
Riders should aim to join a break in the high mountain stages towards the end of the race, when the GC riders will most likely be focusing on each other and won’t care about the break as much. And they should hope this break is as large as possible – as long as they back themselves to beat the rest of the breakaway!
Looking at the actual profile of the stages in the 2020 Tour gives an extra level of insight missing from the model used. For example, stage 16, which is never flat and has a steep ascent well-suited to an attack 20 km from the end, seems more likely to have a breakaway winner than the following stage, which might be too tough for a break to beat the top GC riders.
But as discussed, much of it depends on the mood of the peloton and how willing they are to let a break go the distance. The graph below put the chances of a successful breakaway at around 25% for yesterday’s stage 6. As we now know, the peloton was happy for the break to stay away, leaving Alexey Lutsenko to finish nearly three minutes ahead of the bunch.
How will the rest of the stages unfold? We’ll have to wait and see.
About the author
Josh Silverbeck is a final-year Maths student at Oxford University. He enjoys playing sport, particularly water polo and cycling, and has spent the summer writing articles analysing data from European football and now the Tour de France.