Peter Sagan and Simpson’s Paradox: How the ‘worst’ sprinter keeps winning the green jersey

by Jamie Jowett


Despite not winning a stage, Peter Sagan (Tinkoff-Saxo) was undoubtedly one of the most impressive performers at this year’s Tour de France. His remarkable consistency (12 top-10 finishes) earned him the green ‘sprinters jersey’, despite changes to the points classification designed to favour the ‘pure sprinters’.

Jamie Jowett has crunched the numbers on Sagan’s performances at the past few Tours de France to explain why the Slovakian might be a prime example of a statistical oddity known as Simpson’s Paradox and how this makes Sagan comparable to tennis ace Roger Federer.


Of the sprinters at the Tour de France, Peter Sagan is amongst the worst. In fact, Sagan has won less than 5% of all stages he has raced in over the past four Tours de France.

Ok, with four consecutive green jerseys to his name, Sagan is not the worst at all. Far from it — he’s extraordinary — but he may just be a classic example of Simpson’s Paradox

Simpson’s Paradox within sport was explained in a fascinating piece in The Atlantic by Ryan Roderberg, in which tennis star Roger Federer was described as “having the worst record amongst players active since 1990 in so-called ‘Simpson’s Paradox’ matches” – matches where the loser wins more points than the winner.

Federer’s low 14% success rate in such matches was noted by Roderberg as the worst of all 72 players who had also competed in them.

Originally published in the International Journal of Performance Analysis in Sports, Rodberg and his fellow researchers Jeff Sackman and Ben Wright considered 61,000 ATP matches from 1990 to 2014. They found that Federer lost more points and games than any other player in the top ATP 100 rankings, and yet, partly because he is one of the most competitive players around, wins more matches than almost everyone else too.

As such, Federer is noted as a classic example of The Simpson Paradox, a statistical oddity where a trend or relationship that is observed within multiple groups (losing points and games) reverses when those groups are combined (he wins plenty of matches).

So how does this relate to Peter Sagan? Consider how the young Slovak performed against his green jersey rivals in the past four Tours de France (since 2012 was Sagan’s debut Tour):

balls

Clearly, winning a green jersey clearly needs more than big balls.

Compared with most riders in the Tour, Sagan’s results are excellent, but ranked against his green jersey rivals by stage wins, Sagan’s results are not great. He’s had no stage wins since 2013 while Marcel Kittel has eight wins in two years and Andre Greipel has nine wins in four years. Sprint king Mark Cavendish has more wins than Sagan, even while being unlucky with serious crashes both in 2013 and 2014 and illness in 2015.

But while Sagan might ‘lose’ plenty of sprints (like Federer losing plenty of points), he’s clearly successful when it comes to winning the ‘sprinter’s jersey’ (like Federer winning lots of matches). How?

In the combination of sprint and medium-mountain stages in the past four Tours de France, Sagan has finished ahead of his green jersey opponents with the sole exception of Alexander Kristoff, who just shaded Sagan 6-5 in 2014. Against Marcel Kittel, Sagan has a better record, being 7-4 and 6-4 in the two Tours they have raced against each other. Even in his all-conquering 2015 Tour de France, Andre Greipel lost to Sagan 6-5 overall on sprint and medium-mountain stages.

Like Roger Federer, Sagan clearly doesn’t drop games easily. Often by virtue of his youthful bravado, combined with innate bike-handling skills, Sagan has placed top 10 in 96% of all Tour de France sprint stages he has raced in, and no worse than third in 50% of them. In 2015 he exemplified this with 80% podium finishes in the sprint stages.

All this has been achieved without the luxury of the big sprint trains that Lotto Soudal, Giant-Alpecin or Etixx-Quickstep have. Without wheels of teammates to suck on in the finale, Sagan has to work harder to always be positioned up the pointy end as they come under the 1km-to-go banner. And yet, at this year’s Tour de France, Sagan was in 10th place or better with 1km to go in every one of the sprint stages.

As we know, the points’ classification is labelled the sprinter’s jersey since most points are allocated to the sprint (flat) stages. Between 2011 and 2014, 45 points were allocated for a stage win on a designated ‘flat’ stage, and 30 points were available for winning a ‘medium mountain’ stage. This changed in 2015 with an additional five points being awarded for the winner of ‘flat’ stages, while medium mountain stages yielded the same prize as before.

So how exactly does Sagan win the green jersey, particularly now that the points classification has been tipped in favour of the ‘pure sprinters’?

La Toussuire - Les Sybelles - France - wielrennen - cycling - radsport - cyclisme - Sagan Peter (Team Tinkoff Saxo) pictured during  le Tour de France 2015 - stage 19 - from Saint-Jean-de-Maurienne to La Toussuire - Les Sybelles on fryday 24-07-2015 - 138 KM - photo Dion Kerckhoffs/Sabine Jacob/Marketa Navratilova/Cor Vos © 2015

When we consider that Sagan has now ridden 71* (nearly 85%) of all his Tour stages in the green jersey, the paradox of Sagan being the ‘worst sprinter who wins the most points’ is more a case of being the strongest climber of the sprinters, especially in the second and third week.

Sagan clearly races the Tour de France strategically, which will no doubt surprise those who sometimes disparage his tactics. He is well aware that second place in a medium-mountain stage is worth nearly the same as seventh in a flat stage sprint, while two fifth places in medium-mountain stages earn nearly as much as second place in a flat stage.

Sagan’s high placing in medium-mountain stages has the added bonus that his rivals usually don’t take any points at the same time. This relative differential has been as high as 30 points at times. In 12 stages over the past four Tours, Sagan has scored points when each of his three nearest rivals earned nothing.

Only twice in four years has Sagan suffered a ‘negative differential’; that is, when he was unable to score any points but one of his rivals did. In fact, in 2015, Sagan had only three stages where he failed to win a point, but in those cases he was alongside each of his rivals in the gruppetto. Greipel had four stages without a point, Cavendish had six, Degenkolb four, and Coquard five.

Screen Shot 2015-08-11 at 3.56.32 PM

Perhaps further explanation of Peter Sagan’s success lies outside of cycling and with another paradox.

In his book ‘Uncertainty’, author Jonathan Fields referred to the Ellsberg Paradox — the idea that people will always choose a known probability of winning over an unknown probability of winning, even if the known probability is low and the unknown probability could be a guarantee of winning.

Sagan does not strike me as one who would be unable to take action, rather he appears to be one who chooses the unknown risk to always have a chance to win. However, the 2015 Tour de France was Peter Sagan’s lowest winning margin in the green jersey to date, finishing ‘only’ 66 points ahead of second place.

In 2012, Sagan could ‘only’ manage fourth at Milan San-Remo, second at Ghent-Wevelgem and third at Amstel Gold. In 2013 he was second at Strade Bianche, second at Milan San-Remo, and second at Ronde Van Vlaanderen. In 2014, he was second again at Strade Bianche, and third at Ghent-Wevelgem. In 2015 he was fourth at both Milan San-Remo and the Ronde van Vlaanderen.

Taking a risk in professional bike racing is not just a matter of potentially crashing; squeezing through a tight gap in the finishing straight, for instance. Risk for cyclists in the Tour de France involves getting in the breakaway, or bridging across to the leaders, or simply sprinting hard day-after-day, and therefore exerting themselves too much, such that a similar effort in the next day’s stage simply cannot be replicated. By this definition, Sagan rode an exceptionally risky Tour de France.

Roderberg’s paper observed that “there are two non-mutually exclusive explanations for Federer’s curious results”. The first was the much higher adoption of risk in the play used by his opponent, things like aggressive second serves and passing shots on returns of serve, providing either big rewards or heavy losses, with nothing in between.

Sagan’s heartbreaking second-place behind Ruben Plaza on stage 16 at this year’s Tour looks very much like that. While Sagan was forced to drag his breakaway companions as they refused to work with him, Plaza was ahead on the dangerous descent, taking big risks. Sagan put on a high-speed descending masterclass on the Col de Manse and said later: “I knew I had to try as hard as I could in the descent even if it meant dying”.

Another quirky aspect of tennis’ scoring system gives rise to Roderberg’s second proposition – many players ‘tank’ a game when up a break, losing points against the serve to save energy and set themselves up for their own serve. Federer, by contrast, doesn’t give up the fight against the serve, often taking his opponent to deuce. With 16 second places in Tour de France stages to his name, Sagan, like Federer, can never be said to tank.

By crunching the numbers, Roderberg’s posited that he’d found “mathematical proof of… unequalled in-match competitiveness” in the case of Roger Federer. As we’ve discovered from the above investigation, the same can surely be said about Peter Sagan.

* Technically, Sagan did not always wear a green jersey even though he had earned it (i.e. on occasion he wore the white jersey instead) and in the first stage of each Tour riders do not wear the coloured jerseys.