//
you're reading...

Datablog

History suggests a 50% drop in Britain’s next Olympic medal haul.

During the 2012 London Olympics, Team GB won 65 medals – more than twice its average, and more they have ever won at any summer games outside of London. The last time Britain took more than 65 medals was during the 1908 London Olympics, where a total of 146 were awarded. This alone suggests that hosting the Olympics gives a huge advantage to the host country. The combination of greater funding, enthusiastic home crowds, and familiarity with the sporting grounds are just a few of the factors that could be leading to more wins.

It could be expected that data would bring greater clarity to the question. Since the first Olympics in 1894, there have been 27 Olympics under the International Olympic Committee. In 117 years of competition a total of 9,543 medals have been awarded to 19 host nations – plenty of data to attempt to analyse the effects of hosting the Olympics.

Graphing the overall medal count was not particularly helpful. Nations vary wildly in Olympic achievements, and absolute counts give unfair bias towards wins of larger nations with bigger teams. Instead plotting the percentage change of total medals won helps this issue as it shows improvement rather than absolute scores. Additionally, I mapped this against a time scale, shown on the horizontal axis. Zero represents the game hosted by a particular nation, with negative values representing the previous games leading up to the host.

There is a very clear spike during a country’s year of hosting, with a 468% average increase in total medals won followed by an average of a 48% drop the year after. This suggests that hosting the Olympics has some form of a lasting effect. From the games prior hosting, to two games after, a host nation can expect an average of a 195% increase. A huge jump which, according to the current model, will not experience any significant change thereafter. Unfortunately the average is skewed due to a significant outlier that considerably changes the model.

Ignoring the 1908 London Olympics drops a host nation’s average increase in medals from 468% to 188%. 

The 1908 London Olympics significantly skews the average. In 1904, Britain only sent 3 athletes to the St. Louis Summer Olympics. They won a mere two medals. At home in 1908, Britain’s team had 676 athletes. This time, they won 147 medals, Britain’s highest to date. Unsurprisingly, this represented a 7200% increase in medals won.

This is an anomaly in the purest sense of the word. These graphs are set so that they do not display this by default. However, clicking on the relevant Olympiad in the legend will display the data point, giving a sense of how unusual this is. Ignoring the 1908 London Olympics drops a host nation’s average increase in medals from 468% to 188%. Once again considering the period between the games prior hosting, to two games after- the host nation would now expect an average increase of 50% in comparison to the previous models 195%

The next graph provides further analysis of the average, as well as making those adjustments (the unadjusted graphic is also provided). The adjusted average takes into account four anomalies: two cases where huge percentage changes were due to small numbers (for example, a jump from one to five medals would be a 500% increase), and other cases where there was a significant increase in the number of attending athletes.

Here, I considered what the average would look like assuming it is normally distributed. In layman’s terms, this is akin to considering how likely it is for nations to deviate from the average. I’ve considered two levels of uncertainty: 65% and 95%, or one and two standard deviations. This means, assuming the data is normally distributed*, you can be 95% that no nation will be outside of the light blue area, and 68% they will not leave the dark blue area.

You may also notice that none of the adjusted graphs goes below -100%. This is because it is not possible for nations to attain a negative medal count, which a change less than -100% would imply. To show how well the data fits these assumptions, I’ve shown a graph of the adjusted average normally distributed against the data points. If the points cluster around the average and blue areas, with only a few points in the white area, then this is an accurate prediction of a host nation’s performance at the Olympics.

The data also shows that it is highly unusual for nations to do more than triple their medal count after hosting, with the average hovering between -10% and 20%. Additionally, while hosting the Olympics offers a greater chance of improved performance, it doesn’t prevent failure. Of the 262 data points, I found 14 to land outside the blue area – a little more than 5%, which is expected. In short, the confidence intervals seem to be accurate, which suggest the following statement would be correct:

“Based on historical data you can be 68% certain that Britain medal count in 2016 will be between 11 (a -82% change) and 56 (a -14% change). On average most nations in Britain’s position would win 34 medals in 2016, 13 less than Team GB achieved in Beijing 2008.”

It is important to point out how fallible these metrics can be. The average I took of the start was twice what it would have been only because of a single data point from 105 years ago. Single data points can significantly change their meaning, and averages should always be considered against other metrics. Additionally, the confidence intervals assume that the data is normally distributed. I will show why I believe that is not an unreasonable assumption below, but it should never be taken as certain.

As for the data itself – don’t expect some great sporting legacy from hosting the Olympics. Yes, you could be the anomaly. It is possible and it certainly should not say that hosting the Olympics is ever a bad thing. Yet this much remains true: the model shows that there is only a chance of a significant bump in medals when you are hosting, and that the most consistent point of data was the hangover that followed. Then again, I suppose you could argue that hosting an Olympics would not be special if the high lasted forever.

 

Notes

1) Normal distribution

The assumption that data fits a normal distribution is frequent in the world of Economics, with some considering it to be a contributing factor behind the 2008 financial crisis. Value at Risk, or VAR is a common method of measuring the maximum loss in banking. It assumes that revenue or profit is normally distributed – but it very simply is not. That doesn’t necessarily make it a bad tool, but a dangerous one. The important part is to acknowledge the limitations of tools used rather than relying on them as the gospel word. VAR will be wrong just as the model will be wrong. The aim when using statistics in economics is to be more right than you are wrong – and I hope the last diagram displayed helps to prove that.

As a point of interest I have also included a graph contrasting the data against a normal distribution – the more similar they are, the more likely the data is normally distributed. The line shows what the data would look like if it was normally distributed, while the column shows the actual distribution of the data itself.

When interpreting this graph, it is important to remember that the area is equivalent to the probability so that, in theory, the total area under the line would be equal to one. While the data is not a perfect fit, there are points that hit the line, and peaks in the right areas. In short, it could be much worse.

In my opinion, this graph says a lot about the use of statistic in social science – in fact, this is why economics is so frequently considered the “dismal science”. The statistical theory is right, and is often highly effective when applied in other fields. In Physics maths is so effective that it can be used to establish theory, allowing the evidence to be sought out subsequently. Sadly atoms appear to be more rational than human beings, or at least more consistent. However, atoms appear to be more rational than human beings or at least more consistent. This said, using a dismal science is better than no science at all. Nobel Laureate Paul Krugman neatly analyses this relationship in this blog post.

 

2) The anomalies

There were four data points that I did not include in the adjusted average. For the sake of openness, here are the points I did not include and why. Additionally, I included the points for the comparison between the certainty levels and the data points, which clearly show them to be significant outliers from the average.

  • Great Britain in 1908– As mentioned earlier, Team GB went from 3 athletes sent abroad in 1904, to 676 at home in 1908. The significantly larger team (and thus the ability to compete for more medals) makes the 7200% increase much less surprising.
  • Australia in 1948– Australia more than doubled the size of their team from 1944 to when they hosted in 1948, bringing their previous team up to 77 from 33. Their medal tally subsequently rose from 1 to 13, leading to a 1200% increase.
  • South Korea in 1976 and Mexico in 2000–  While an improvement of 5 medals is not necessarily much, it is a substantial improvement from a previous medal haul of 1 – a 500% improvement. Cases like this show how percentage increase discriminates against those who consistently have higher medal counts, while favoring those who win one or less. Nonetheless, the points were infrequent enough to me to feel comfortable in casting them aside.

 

3) The data

Country Great Britain China Greece Australia United States Spain South Korea Soviet Union Canada West Germany Mexico Japan Italy Finland Germany Netherlands France Belgium Sweden
2012 65 88 2 35 104 17 28 18 7 38 28 3 44 20 34 3 8
2008 47 100 4 46 110 18 31 18 3 25 27 4 41 16 41 2 5
2004 30 63 16 50 101 20 30 12 4 37 32 2 49 22 33 3 7
2000 28 58 13 58 93 11 28 14 6 18 34 4 56 25 38 5 12
1996 15 50 8 41 101 17 27 22 1 14 35 4 65 19 37 6 8
1992 20 54 2 27 108 22 29 18 1 22 19 5 82 15 29 3 12
1988 24 28 1 14 94 4 33 132 10 40 2 14 14 4 9 16 2 11
1984 37 32 2 24 174 5 19 44 59 6 32 32 12 13 28 4 19
1980 21 3 9 6 195 4 15 8 3 14 1 12
1976 13 0 5 94 2 6 125 11 39 2 25 13 6 5 9 6 5
1972 18 2 17 94 1 1 99 5 40 1 29 18 8 5 13 2 16
1968 13 1 17 107 0 2 91 5 26 9 25 16 4 7 15 2 4
1964 18 0 18 90 0 3 96 4 1 29 27 5 10 15 3 8
1960 20 1 22 71 1 0 103 1 1 18 36 5 3 5 4 6
1956 24 1 35 74 0 2 98 6 2 19 25 15 0 14 2 19
1952 11 0 0 11 76 1 2 71 3 1 9 21 22 24 5 18 4 35
1948 23 0 0 13 84 1 2 3 5 27 20 16 29 7 44
1936 14 0 0 1 56 9 3 18 22 19 89 17 19 2 20
1932 16 0 0 5 103 1 15 2 18 36 25 20 7 19 0 23
1928 20 0 4 56 1 15 0 5 19 25 31 19 21 3 25
1924 34 0 6 99 0 4 0 1 16 37 10 38 13 29
1920 43 1 3 95 2 9 2 23 34 11 41 36 64
1912 41 2 63 8 0 6 26 25 3 14 6 65
1908 146 3 47 16 4 5 13 2 19 8 25
1904 2 2 0 239 1 6 4 13 0
1900 30 0 5 47 2 1 0 8 4 101 15 1
1896 7 0 46 2 20 13 11 0

  Source: Wikipedia

(While there are discrepancies between different sources (some recognise the 1906 Olympics), and the IOC’s database is a chore to export. Wikipedia provides an easily transferable if questionable source of data. )

Discussion

No comments yet.

Post a Comment