Can Zwift solve its cheating problem?
If you’ve ridden Zwift, and certainly if you’ve ever raced on the platform, you’ve likely seen them. The amateurs blowing past with Cancellara watts; the B-category race winners who, based on their apparent power output, should be in talks with Ineos.
Cheaters. Some intentional, some unintentional, all skewing the numbers and getting ahead.
The very concept of cheating to win an online bicycle competition in which no party ever actually moves anywhere feels head-shakingly absurd. The stakes, as Jeff Goldblum once said, are medium.
Or are they? For pros, who are now racing on Zwift, providing value to their sponsors via Zwift, the accuracy of Zwift races suddenly matters to their livelihood. For the rest of us, well, the whole point of Zwift is gamification and competition. If you can’t count on a fair fight, that ruins the game, and nobody will want to play anymore.
So what’s Zwift doing about it?
Zwift cheating falls first into two broad categories – intentional and unintentional. Underneath these two are a plethora of available methods, ranging from the simple to the absurd.
Intentionally, you could submit a fake weight, which skews Zwift’s watt/kilogram (and thus speed) calculations, for example. This is the late-’90s EPO of Zwift doping – ubiquitous, effective, and almost impossible to detect.
Or you could hack your smart trainer to increase your power readings. This has actually happened. Actual people did this.
You could ride an ebike, or stick a 500-watt drill into your crankset and have at it. This is rare. The Femke Van den Dreissche of Zwift doping.
In the gray zone between intentional and unintentional are forgetting to update your weight, or incorrectly calibrating your power meter or trainer. These are the testosterone-in-the-protein powder of Zwift doping. Maybe not malicious, but you should have been paying attention.
All are a problem for Zwift. And with the exception of its upper-tier racing series, it’s not particularly concerned with intent.
An automated solution to amateur Zwift cheating
The problem of fake weight input, for most amateur Zwift competition, is not one that’s possible to fully solve. At least not yet. Dropping a few kilos off your avatar will make you go a bit faster, but you’re still not going to be fast enough to trigger Zwift’s current, or future, automated cheating detectors.
Automated cheating detectors? Indeed. Zwift already has them in place. Perhaps you’ve seen the Cone of Shame floating over a rider’s head, indicating that they’ve triggered Zwift’s ire. After all, the whole game is just data whizzing around, and with enough data you can solve lots of difficult problems. The systems are getting better all the time, according to multiple Zwift representatives I spoke to for this story.
It’s good that the automated systems are improving, because there are thousands of riders on Zwift on a peak day, and a huge number of those riders enter races. The volume is simply far too high for a human solution to Zwift cheating. The broad goal, from an amateur Zwifting perspective, is to prevent blatant sandbagging – when a rider who is clearly able to race well in the A or B category instead puts themselves into D, for example, just so they can win.
At the moment, Zwift is beta testing a new, much more powerful automated system. It’s currently restricted to just a single event series on a single location (Crit City).
In broad terms, here’s how it works:
Zwift has power data on each rider, whether from a power meter, smart trainer, or estimated trainer power. It knows, for example, what your best 5-minute power was in the last two months of riding, and what your best 1-minute power was. And it knows that 5-minute power tracks pretty well with threshold power, and 1-minute power is a good marker of anaerobic capacity. In other words, it knows how fast you usually are.
Threshold power, or FTP, is how Zwift sets up its race categories. If you can do 4+ watts per kilogram (W/kg) of rider weight, you belong in the As, for example.
The system creates triggers using a combination of historical 5-minute power and 1-minute power, the details of which Zwift won’t divulge.
So let’s say you try to sign up for a C race, which is intended for riders in the 2.5-3.1 W/kg threshold power range. But just last week you did a 5-minute effort that suggests your threshold power is far higher than that. Let’s say 4 W/kg.
When you try to sign up for the C race, Zwift will throw a pop-up onto your screen suggesting that you fight some folks your own strength. At least at first, it won’t force you to upgrade, though. But if you opt into the Cs anyway, Zwift will decrease your power within the game, throttling you back so that you’re no faster than the other Cs. And it will give you the Green Cone of Shame, which sits over your avatar’s head and tells everyone you’re in the wrong category.
Zwift is otherwise remaining relatively tight-lipped regarding how, precisely, this system works. Where the triggers lie, for example. This isn’t too surprising. The World Anti-Doping Agency doesn’t tell athletes exactly how it tests, either. Doing so invites and accelerates the search for workarounds. There’s no timeline for taking the system out of beta, though a Zwift representative said it looks “promising.”
The system currently in Crit City beta is more sensitive and more powerful than what is currently in effect across the rest of the game. “Right now, we’re just looking at the established limits in the power profiling work done by Andy Coggan and Hunter Allen,” said Jordan Rapp, Zwift’s senior game designer. “That’s the information that triggers our traditional “UH OH!” flag, suggesting that we’re seeing an unexpected power value.”
“For the anti-sandbagging work, we’re using the same discrete CP (Critical Power) values but anchoring them to the existing limits for the community defined racing categories (ABCD),” Rapp said.
Those “Uh oh” triggers are already in action on Zwift, but they’re not perfect. That’s how pros like Ashleigh Moolman-Pasio got flagged and booted from races for superhuman power numbers. “You missed your calling as a pro cyclist,” Zwift told her via a bubble pop-up.
— Ashleigh Moolman (@ashleighcycling) April 3, 2020
What about pro racing?
Higher-level racing gets a completely different treatment, more similar to what we see in outdoor racing. The solution, given the smaller scale, is more human and less algorithm.
At the effort’s core is an organization called ZADA, short for Zwift Accuracy and Data Analysis (and formerly short for Zwift Anti-Doping Agency, back when it was a volunteer, grassroots effort). Since March of 2019, ZADA has been contracted by Zwift to be the watchdog over Zwift racing. It’s the WADA or USADA of Zwift, basically.
In many ways, ZADA feels like it’s modeled after these anti-doping agencies that cycling fans are already familiar with. Like WADA and USADA, ZADA doesn’t spend much of its time and effort on amateur racing. These days, there are simply too many Zwift races for the small team of engineers and data scientists who work with ZADA to keep an eye on them all.
When major races occur, like the KISS series or the Tour for All — which included major WorldTour teams — ZADA steps into action.
The first step is simply watching the race. A ZADA representative will tune in and watch for superhuman efforts. They’ll also receive each rider’s data after the race – power outputs, weights, everything.
ZADA will then dig deeper into a select number of rider’s data files. Zwift calls this “verification.” A rider’s files are investigated, using known physiological parameters and data science tools, so that ZADA can determine if an effort is legitimate.
The top three in any major race are almost certain to go through the verification process, just as they would at a WorldTour race. And just like a random anti-doping test at major outdoor races, ZADA also inspects riders’ files from outside the top three.
“We often pick riders based on data that we believe is suspect,” said Stephen Chu, Zwift’s general counsel, who heads up the company’s cooperation with ZADA. “It’s not necessarily always suspect, but if we think that you did something in the race that is not normal, we’ll flag it and just say, ‘OK, you’re you’re going to automatically get reviewed.’
“If we’ve never heard of you, and it looks like you should be winning the Tour de France, we’re probably going to take a second look,” Chu said.
A second look begins with data analysis. Think of this as the A sample test, if we’re talking in outdoor racing terms. Zwift won’t say precisely what ZADA is looking for, and the clues will depend on the specific type of error or attempted cheating. Certain types of power spikes and dips, for example, tend to indicate a power meter out of calibration. Power that never fully returns to zero means the same. But much of the time there’s no such smoking gun, as there are almost as many ways to cheat, intentionally or accidentally, on Zwift as there are in the real world.
If that A sample test returns a questionable result, ZADA moves on to the equivalent of a B sample – confirmation. Can the rider in question actually produce the effort he or she did in the race?
To answer that question, Zwift sometimes sends riders to a local lab (at least in pre-COVID times). “We have a protocol, we send you to a local lab, and we find out if you can do those numbers,” Chu said.
Another option is to send the rider outdoors. “We do a Strava test,” Chu said. “We basically give you a protocol that you follow for going outside and doing it on your own.” A rider will have three days to head out and prove that they have the same power in the real world as they just showed on Zwift.
A final option, and the one used right now as riders can’t always head outside, is to do another indoor test. Specifically, riders have to tackle the Three Sisters course on Zwift, follow a particular protocol, and send power meter files in for examination. The additional data is enough to help ZADA make a determination as to whether the race effort was possible.
At the end of the whole process, ZADA makes a decision. Absent an obvious abnormality in the data, that conclusion has to be somewhat subjective, based on a rider’s previous efforts and what ZADA knows is humanly possible. Think more along the lines of the athlete bio passport, which compares an athlete’s blood and urine values over time and looks for unusual spikes and dips, rather than a cut-and-dry anti-doping positive.
“Do we catch all cheaters? Maybe not,” Chu said. “Do we let cheaters go that had a performance that was maybe a little bit better than they’ve ever been able to do in their entire life? Who knows? I think that’s a hard question to answer, because there are some people who say, ‘I’ve been training for this particular race, like it was the only thing that mattered to me in the last six months.’”
Zwift is in a unique position. They have the ability to build their own anti-cheating system from the ground up. They have more data than any anti-doping agency, and an incentive, at least for now, to keep the game clean. But complete success, the eradication of Zwift cheating, a completely level playing field, that’s never going to happen. Bike racers will make sure of that.