We present here an exploratory analysis of the full dataset. The workflow we try to follow in this study is the one suggested in this useful paper.

For more details about how the analysis is done, please refer to this Jupyter notebook hosted on Github.

An overview in video

We first try to give an overwiew of the data in video, following the approach of this great example : 200 Countries, 200 Years, 4 Minutes. Here, we use the format to display jointly :

  • the pace (on the Y axis),
  • the experience of the runners (ie. whether they are beginners or experts, on the X axis),
  • different races and their number of competitors (dots and their size),
  • the runners sex (with colors),
  • the runners age.

Conclusions on this video:

  • Young runners run on shorter distances;
  • Men pace is always a bit lower than women pace;
  • Pace is higher for very young and old runners, with a minimum around 20 years old;
  • There are lots of new young runners (<12 years old) for low distance races, but most of the new runners are between 15 and 25 years old.
  • The number of new runners decreases with the age, together with the total number of runners of a certain age.

Table of contents

Note on unique runners

Given the significant size of the dataset, we encountered some difficulties when trying to distinguish runners according to their attributes, given that names are not unique.

Checking the demographics of a few races, we could notice that there were even some namesakes born the same year but with different residence cities. At first glance this combination (name, birthyear and residence city) could have been used to determine unique runners, however this does not necessarily works : some people might just change residence during their running career (indeed we can find that by cross-checking our dataset with other resources on the web) and be idientified as two different people.

We have therefore decided to stick to a simple rule and to distinguish runners based on their name and birthyear combination.

Statistics across time

Number of runners across time, by sex

The following graph shows the number of runners across time separated by sex.

We can observe a clear increase in time of the number of participants in the races, for both sex, across all Switzerland.

Number of runners across time, by distance

The following graph shows the number of runners across time separated by distances of the races. We only consider the most relevant distances, that is the ones with a larger number of runners.

Interestingly in the last 3 years, in terms of number of runners, it seems to have been:

  • an increase for the 10 km;

  • quite a stable situation for the half-marathon;

  • a decrease for the marathon.

Distribution of the number of editions per race

The following graph shows the number of races with respect to the number of times these races took place.

Consequently, most races have been organized at most twice. The most frequent race in history is Chäsitzerlouf - Kehrsatz, which has been organized 16 times. Right after, with 15 editions, come 20km de Lausanne, Basler Stadtlauf, Frauenfelder, Gurtenclassic - Wabern, Kerzerslauf, and Schweizer Frauenlauf Bern.

Age across editions

For this study we use the first 20 races with the largest number of runners.

Select a race:

After performing an Mann-Kendall test (with p-value threshold at 0.05), we conclude that for many of the races analysed, it is not possible to claim a global trend for the runners’ mean age, across time. The test is affected by the lack of data points. We have however also obtained some significant results:

  • there is an increase of the runner’s mean age for ASICS Bremgarter Reusslauf, Jungfrau-Marathon and Kerzerslauf;

  • there is a decrease of the runner’s mean age for 20km de Lausanne, Course de l’Escalade, Hallwilerseelauf, Luzerner Stadtlauf, Morat-Fribourg, Thuner Stadtlauf and Zürcher Silvesterlauf.

Statistics on performance vs age

Note that we only consider races with the higher number of runners, and categories with standard running distances: 10km, 20km/half marathon and full marathon.

We will see that a U-shape is observed for the most popular events, especially when enough data is available. This shape becomes more visible with longer distances (full marathon), while it somehow fades out for shorter distances, like 10 km.

Marathon

The following graph represents the median time the runners took to finish a full marathon with respect to the age of the runner, for different races.

Select a race:

Half marathon / 20km

The following graph represents the median time the runners took to finish a half marathon with respect to the age of the runner, for different races.

Select a race:

10km

The following graph represents the median time the runners took to finish a 10km race with respect to the age of the runner, for different races.

Select a race: