Here’s what happens if you simulate the 2021 Tour de France 100 times

by Matt de Neef


Predicting the outcome of bike races is a fraught business. It’s something we do regularly here at CyclingTips when we write our race previews and while we’re sometimes on the money, we usually aren’t. Bike racing is just too complex, with too many confounding variables, to predict anything with any real certainty.

But what if there was another way to predict how a bike race might unfold? What if we could run a simulation to do that job for us? And if we could do that, what would that simulation tell us about how the 2021 Tour de France was going to unfold?

Well, wonder no more.

Simulating the Tour de France

Pro Cycling Manager (PCM) is a videogame series that puts you in the shoes of a team manager. Over the course of a racing season (or multiple seasons) you pick your riders for each race then take control of those riders as they tackle each event on the calendar. (If you’re keen to learn more about the PCM franchise, follow the links for our reviews of the 2013, 2014, 2018 and 2020 editions.)

PCM can also be used to simulate races. That is, for any given race, you have a choice: play out that race by controlling your riders, or let the game simulate what would happen, taking into accounts the strengths of each rider, the race situation, and the terrain.

Which brings us to what Reddit user “ser-seaworth” put together over the past week or so. In a post in the Peloton subreddit – a Reddit community focused on the pro racing scene – ser-seaworth explained the project as follows:

“Inspired by every single rider who makes the early break on a flat stage, I decided to embark on a futile journey of my own: to simulate the entire Tour de France 100 times.”

Ser-seaworth took last season’s version of PCM and added a community-created mod – “WorldDB 2021” – to update the game’s database to reflect the 2021 racing season. Then, over the course of a week, they simulated the Tour 100 times over, using the same course and startlist as the real thing. That’s 2,100 simulated stages, many, many hours of waiting for various loading screens, and plenty of time spent cataloguing all the results the game generated.

Let’s dive into some of the more interesting results.

Each rider in PCM is given a rating out of 100 for a total of 13 different skills. In the case of the WorldDB 2021 mod, these stats are determined by members of the PCM community who helped create the mod.

The race overall

Over the course of 100 simulations, nine different riders took at least one overall victory:

  • Tadej Pogačar: 58 wins
  • Primož Roglič: 24 wins
  • Richard Carapaz: 4 wins
  • Richie Porte: 4 wins
  • Enric Mas: 3 wins
  • Geraint Thomas: 3 wins
  • Pello Bilbao: 2 wins
  • Miguel Ángel López: 1 win
  • Simon Yates: 1 win

You’ll note that only 18 of the 100 Tours weren’t won by Pogačar or Roglič. Of those 18, 11 were won by an Ineos rider.

Based on how the real-life Tour is unfolding, Pogačar’s dominance in these simulations doesn’t feel too far removed from reality.

The podium

In the 18 Tours that Pogačar or Roglič didn’t win, they were very likely to finish on the podium. In fact, Pogačar was in the top three in 87 of the 95 simulated Tours he finished, and Roglič was on the podium 78 times.

Others with a high number of podium finishes were:

  • Richie Porte: 36 times
  • Richard Carapaz: 24 times
  • Geraint Thomas: 15 times
  • Enric Mas: 11 times

As ser-seaworth noted, this simulation seems to answer the question that was on many people’s lips pre-Tour: who should lead Ineos Grenadiers? “Now the Ineos picking order becomes apparent: Richie Porte, Richard Carapaz, and Geraint Thomas, in that order,” they wrote. “His 19 second-place finishes are where Porte makes the difference, making him the most reliable Ineos leader in this simulation.”

As we know, the real-life 2021 Tour hasn’t quite gone as planned for Porte (or indeed for Thomas). Carapaz seems a good bet for the overall podium though.

The Average Tour

If we average out the results from all 100 of ser-seaworth’s simulations, we get what they call “The Average Tour”. By removing the variance that affects any individual simulation, this gives us something approaching the “correct” result for a simulated 2021 Tour, based on the stats of every rider in the race.

Taking into account how often each rider finished in each position across 100 simulations, here’s the top 10 in The Average Tour:

1. Tadej Pogačar
2. Primož Roglič
3. Richie Porte
4. Richard Carapaz
5. Enric Mas/Simon Yates
6. Wout Poels/Enric Mas
7. Geraint Thomas
8. Miguel Ángel López
9. Bauke Mollema/Simon Yates
10. Rigoberto Uran

Here are the riders that aren’t listed above, but that sit within the top 10 overall in the real Tour (after 10 stages):

  • Ben O’Connor (second)
  • Jonas Vingegaard (fourth)
  • Wilco Kelderman (seventh)
  • Alexey Lutsenko (eighth)
  • Guillaume Martin (ninth)
  • David Gaudu (10th)

It will be interesting to see how much the real-world Tour and Average (simulated) Tour top-10s converge by the time the real riders reach Paris.

Green jersey

Ser-seaworth didn’t just look at the race overall. Here’s who won the green jersey across the 100 simulations:

  • Caleb Ewan: 55 times
  • Arnaud Démare: 26 times
  • Wout van Aert: 8 times
  • Mads Pedersen: 5 times
  • Tim Merlier: 3 times
  • Mathieu van der Poel: 1 time
  • Peter Sagan: 1 time

We’ll never know if Ewan would have won green in this year’s Tour but based on the sprinter-friendly parcours, it seems likely he was a very good chance, had he not crashed out early.

The fact that seven-time points classification winner Sagan only won the green jersey once in 100 simulations is noteworthy. Ser-seaworth offered this explanation: “PCM can’t factor in Sagan’s intrinsic motivation to spend four hours in the break on a mountain stage for 15 intermediate sprint points, as he did in previous years.”

KOM jersey

There was more weirdness afoot in the battle for polka dots. There, Wout van Aert won the climber’s classification a total of 38 times out of 100 – easily the most prolific winner. Guillaume Martin was second with eight wins, ahead of Alexey Lutsenko with six.

“I have no idea how or why this happens, but seeing Wout win several sprint stages as well as the KOM jersey is pretty much par for the course,” ser-seaworth wrote. “Perhaps PCM’s simple analysis of Wout’s stats has come to a conclusion that we haven’t been able to see yet. [Michael] Matthews and [Mathieu van der Poel] also managed to win the climber’s jersey, so it’s definitely not just a Wout thing either.”

White jersey

The best young rider classification was all about Tadej Pogačar. The 22-year-old Slovenian won white 91 times out of a 100 … or 91 times out of 95 when you consider the five simulations in which Pogačar didn’t finish the Tour (riders can crash in PCM, suffering injuries of various kinds).

Others to win white were Sergio Higuita (four times), David Gaudu (three times), and Lucas Hamilton (twice).

These results feel pretty spot on. It’s hard to imagine a scenario where Pogačar finishes the Tour de France but doesn’t win the white jersey.

Stage wins

Over the course of 2,100 simulated stages, 107 different riders won at least one stage. The riders with the most wins were:

  • Caleb Ewan: 334 wins (15.9% of all stages)
  • Tadej Pogačar: 270 wins (12.9% of all stages)
  • Wout van Aert: 189 wins (9% of all stages)

Ser-seaworth notes that, interestingly, in 100 simulated editions of the Tour, Mark Cavendish won a grand total of just one stage. “Eddy Merckx’ record is very safe in the virtual world of PCM,” they wrote. “Davide Ballerini, [Cavendish’s] teammate at Deceuninck, was the chosen son and achieved 14 stage wins instead.”

Oddities

When you’re simulating more than 2,000 days of racing, you’re bound to come across a few anomalies and oddities. Here’s a selection of what ser-seaworth found.

  • Anthony Delaplace (Arkéa Samsic) winning a total of five stages.
  • Omar Fraile (Astana-Premier Tech) winning on the Champs-Élysées.
  • Lorenzo Rota (Intermarché-Wanty-Gobert Matériaux) taking yellow in one simulation by winning stage 1.
  • Magnus Cort Nielsen (EF Education-First) finishing seventh overall in one simulation.
  • Team B&B Hotels being the only team to miss out on a single stage win across the entire 100 simulations.

So what?

So what can we take from this whole project? Does it contain any useful data that can help us predict the outcome of real-world bike races?

Well, probably not. In PCM, each rider’s performance is governed by their 13 skill scores and while this level of granularity does paint a fairly detailed picture of each rider and their strengths, it’s far from a complete picture. There are countless other variables in bike racing that a simulation will never be able to account for. Things like the complex suite of human emotions, which fluctuate from moment to moment, and can significantly affect a rider’s motivation on a given day. Or the effect weather conditions can have on a race.

And even if every single variable in a bike race was accounted for in the game, it would be impossible to assign the right values to every single variable, for every single rider. It’s difficult enough to accurately rank each rider across 13 skills relative to each other rider in the race.

But if we’re judging the value of this project by how it compares to reality, we’re kind of missing the point. Ser-seaworth summed it up eloquently at the bottom of their Reddit post:

“No single simulation has been able to get more than a few correct predictions,” they wrote. “Does this mean reality is too crazy to be simulated? Am I disappointed that PCM couldn’t get it right? Not at all.

“What I found between the lines of my spreadsheet, was a newfound appreciation for the complexity of cycling. Reducing each rider to 13 numbers gives a fair approximation, but the result is about as accurate as your average SWL [Stage Winners League] entry or PelotonMod Prediction Thread: not very accurate. There has to be more.

“Cycling, and especially the Tour, is just much more than PCM can ever hope to accurately simulate. It’s hopes and dreams, and life-long goals. It’s fatigue, and motivation, and fighting the elements. It’s teamwork, until suddenly you’re all alone. It’s crashing, and getting back up.

“In how many simulations did Roglic get back on his bike after a crash? In how many simulations would Philipsen ever get a chance to sprint for his own chances? How often did a rider like Colbrelli suffer all day for 15 points at the finish line? Are these digital avatars also motivated to glory by lost loved ones? How many PCM characters cried after beating the time limit?”

Projects like this don’t have to offer some hyper-realistic reproduction of reality to be worthwhile. They don’t have to help us predict what’s going to happen in the real-life Tour. It’s enough for it to just be cool, and interesting, and to provide a different perspective on things.

Chapeau, ser-seaworth.

Editors Picks