Using Math to Catch Athletes Who Dope

The more pressing question in the latest track-and-field doping scandal is not who is guilty but what authorities are doing about the situation.PHOTOGRAPH BY IAN WALTON/GETTY

In the British spa town of Tunbridge Wells, in the seventeen-forties, a Presbyterian clergyman named Thomas Bayes penned a manuscript that would, in the centuries following its posthumous publication, reshape the understanding of probability. Bayes described an approach to making predictions that allowed for the mixing of new data with existing beliefs. The mathematician Richard Price, in his introduction to the original manuscript, in 1763, illustrated the concept using “the case of a person just brought forth into this world.” For such a person, Price wrote, seeing the sun rise for the first time would be miraculous; he would have no basis for predicting whether it would ever happen again. When it rose the next morning, though, he would revise up his estimate of the probability of its happening a third time—and so on, until he approached, but never quite reached, certainty. Bayes provided a practical framework for how to make this sort of prediction, and the French mathematician Pierre-Simon Laplace built on his efforts. With modern computing power to speed up the calculations, Bayes’s method has found widespread application. The U.S. Coast Guard’s Search and Rescue Optimal Planning System, for instance, reckons the likelihood of finding a disabled ship or a stranded sailor in relation to a number of variables, including wind and current data and the search paths already covered.

For sports fans, this sort of prior knowledge is the steroid-crazed gorilla in the room whenever a new doping scandal emerges. From Ben Johnson to Barry Bonds, BALCO to Biogenesis, athletes have given us ample reason to assume that, as a rule, every accusation that appears in the press is true, as are many rumors that go unpublished. The latest big one involves a leaked database of more than a decade’s worth of blood-test results—twelve thousand in all—from five thousand track-and-field athletes. According to Britain’s Sunday Times and the German broadcaster ARD, more than eight hundred athletes in the database showed highly suspicious blood values between 2001 and 2012, and they won a third of the medals in endurance events at the Olympics and the International Association of Athletics Federations World Championships in those years.

More of the same, in other words, except for the scale. With this year’s I.A.A.F. Championships under way in Beijing, the track world is in turmoil as gossip circulates about who is or isn’t on the list. Eight British athletes, including Mo Farah, the defending Olympic and World five-thousand- and ten-thousand-metre champion, ignored the advice of U.K. track officials and allowed their blood values to be published, in an attempt to distance themselves from the scandal. That leaves others looking even more suspicious, but it doesn’t offer a useful way forward for the sport—because to understand what these blood tests mean, it helps to understand Bayesian statistics.

A decade ago, sports scientists turned to Bayes to combat doping agents such as erythropoietin, a hormone that became widely available in the nineteen-nineties and was nearly impossible to detect with conventional urine or blood tests. They soon realized that, even if they couldn’t detect the drugs themselves, they could often detect the effects of the drugs—changes in the levels and ratios of young and mature red blood cells, for example. Even better, this approach could flag athletes who were turning to so-called autologous blood doping, extracting and later reinjecting their own blood without actually using any illicit substances. But many of the telltale blood parameters vary more between individuals than within a single person, so whatever thresholds the scientists set would either catch too many innocents or miss too many cheats. The Bayesian solution: an “athlete biological passport” that calculates individual thresholds of suspicion based on repeated tests—ideally four to six per season—of the same athlete.

A 2011 study published in the journal Transfusion, from researchers at the University Hospital of Freiburg and the Swiss Laboratory for Doping Analyses, illustrates the strengths and weaknesses of the technique. The researchers followed twenty-one athletes—eleven of whom were doping—for nearly a year, analyzing monthly blood tests. Eight of the dopers were flagged as suspicious with ninety-nine-per-cent certainty, the threshold that triggers further investigation. There was only one false positive from a non-doper, a suspicious hemoglobin reading registered in the first of eleven tests. (That test, of course, had no Bayesian element, since there wasn’t yet any comparison data. Its results were measured against population averages.) This particular subject happened to have naturally high hemoglobin; after a few tests, the Bayesian thresholds tightened around her typical values.

Individual variability isn’t the only factor that can trip up attempts to infer doping indirectly. A study published this month in the journal Drug Testing and Analysis monitored ten swimmers from the Danish Olympic team before and after a three- to four-week high-altitude training camp. Their bodies responded to the thin mountain air by producing more hemoglobin, followed by a drop in new red blood cells when they returned to sea level. As a result, six of the ten swimmers triggered doping alarms, despite the use of repeated tests and Bayesian thresholds. Hydration status and even pregnancy can also affect test values, which is why the formal biological-passport program, which was implemented in track and field in 2009, dishes out suspensions only after the results have been vetted by three independent experts who have access to additional information, such as the athlete’s training and competition schedule.

The recently leaked database contains, on average, 1.4 test results per athlete, defeating the rationale of the Bayesian approach. Taken as a whole, of course, its results do suggest rampant blood doping, even if it isn’t clear exactly who is dirty. Back in 2011, researchers working with the I.A.A.F. published a peer-reviewed analysis of the database, concluding that about eighteen per cent of the endurance athletes who had been tested were likely doping. That figure is similar to the Sunday Times’s estimate of eight hundred out of five thousand (sixteen per cent). The difference is that, in the I.A.A.F. study, the data was anonymized, whereas the newspaper revelations contain tantalizing hints about who is on the list—seven British athletes, for example, and a whole lot of Russians. Specific names will inevitably leak out as the database circulates among journalists.

The more pressing question is not who is guilty but what authorities are doing about the situation. In a series of increasingly frantic press releases, the I.A.A.F. has argued that the biological-passport process is working: more than a hundred and fifty cases have been referred to its expert panel since 2011, resulting in sixty-three prosecutions, and suspicious results have repeatedly been used to trigger “intelligence-led, no-advance-notice, OOC [out-of-competition] testing.” Whether you find that convincing—and how you reconcile this clash of collective knowledge and individual uncertainty—is as much a matter of personality as of statistics.