Data Analysis with Pandas, Plotly, and Matplotlib
This article is a journey through the history of the Bundesliga. Analyzing historical data (all classifications from 1963 until 2020), we will be able to answer many questions about the German league. What teams won the German league? What teams nearly won the Bundesliga? When did Bayern’s hegemony start? What teams receive more penalties? … and many more! Continue reading ▶️
Let’s make a brief introduction for those that have never heard about the German league. 🙌
The German football league commonly known as the Bundesliga is the first national football league in Germany, being one of the most popular professional sports leagues across the world. It was founded in 1963 after the unification of five regional leagues from West Germany and consisted initially of 16 teams.
At the end of a match, the winning team is rewarded with three points (before the season 1995–96 with 2 points) and the losing team with zero. In case of a tie, both teams are rewarded with 1 point.
In many European leagues, the bottom three teams are automatically relegated to the second division. On the contrary, in the Bundesliga, only the bottom two are directly relegated to the 2 Bundesliga. The 16th team in the Bundesliga and the 3th in the 2 Bundesliga contest a two-legged play-off for a place in the first division.
The introduction is made! Now, we are ready to analyze the data ❤️
The historical data of the Bundesliga (from 1963 until 2020) was scraped from Bdfutbol.com. This website contains football rankings of the best European leagues.
To scrape the data, we have used BeautifulSoup which is a popular Python library for extracting information from an HTML page. After obtaining all the data, we have stored it in a Pandas data frame for further processing.
The programming code used in this analysis is available here. You can take a look at it as you read the article.
Data Cleaning is the process of transforming raw data into a standardized form that can easily be analyzed with data analytics tools. In this particular case, before analyzing the data using Pandas, we perform a few cleaning operations. First, we remove unnecessary columns and rename the remaining ones using English terms (remember that the data was scraped from a Spanish website). Then, we modify the wrong data types. The column points (points obtained by a team during a particular season) is of data type object instead of integer due to the presence of asterisks. These asterisks are used to refer to explanations at the bottom of the web page and they are not relevant for this analysis. In fact, the data type is not imported correctly because of the existence of these asterisks in some entries of the column. Therefore, we have to remove them, before converting the column points to an integer data type.
After cleaning the data, we obtain a Pandas data frame that can be easily processed to extract conclusions. As shown below, the data frame contains information such as the number of games won, drawn, and lost, the number of yellow and red cards, the number of points, and the position in the ranking of all teams that took part in the Bundesliga from 1963 until 2020.
The Bundesliga has been played by 57 different clubs during its 57 years of existence (up to the season 2019–20); however, only twelve of them got their hands on the trophy. The following plot shows the German league winners from season 1963–64 until 2019–20.
As shown above, Bayern München is the most successful club in the history of the Bundesliga with 29 titles, which represents more than 50% of the leagues. The next most successful teams are Borussia Mönchengladbach and Borussia Dortmund which has won the Bundesliga five times. Apart from them, other teams such as Werder Bremen, Hamburger, Stuttgart, Köln, and Kaiserslautern also had the honor of lifting the Bundesliga trophy multiple times.
There are 6 football teams that have never won the league but they were on one or more occasions runner-ups: Alemannia Aachen, Bayer Leverkusen, Hertha Berliner, Meidericher, RB Leipzig, and Schalke 04. As shown below, Schalke 04 and Bayer Leverkusen have been particularly unlucky being runner-ups of the Bundesliga 7 and 5 times respectively. Additionally, we can also observe that Bayern Munich is the club that has been on more occasions runner-up of the Bundesliga (10 times).
In the season 2016–17, RB Leipzig finished second in the Bundesliga. The club was just founded in 2009 and is currently one of the leading teams in Germany mainly because of the significant investments made by the company Red Bull.
Werder Bremen holds the record for having played the most seasons in the German league. They have played the Bundesliga in 56 of its 57 seasons, being relegated to the second division only on one occasion. Bayern München has played in the Bundesliga uninterrupted since 1965, and Hamburger from 1963 until 2018, both of them 55 seasons in total. Apart from the aforementioned clubs, Borussia Dortmund, Stuttgart, Borussia Mönchengladbach, Schalke 04, and Eintracht Frankfurt have also participated in the German league more than 50 seasons.
In the last season (2019–20), all the teams from the image above played in the Bundesliga with the exception of Hamburger, Stuttgart, and Kaiserslautern.
The Bundesliga began with 16 teams in 1963 and it was enlarged to 18 teams in 1965. Since then, the number of clubs in the Bundesliga has remained unchanged with the exception of the season 1991–92. In that season, the league was temporarily expanded (20 teams) to accommodate the clubs from former East Germany.
Currently, 18 teams play in the German league. The same number of teams participate in the Primeira Liga (Portugal), and the Eredivisie (Holand). On the contrary, 20 teams take part in the Serie A (Italy), the Liga (Spain), the Ligue 1 (France), and the Premier League (England). Nowadays, there is still a debate over whether two more teams should be included in the Bundesliga to be in line with the most important European leagues.
The Bundesliga has been clearly dominated throughout its history by Bayern München winning more than half of the total leagues. Since 2000, Bayern München has won 14 out of 20 leagues, being for other teams almost impossible to compete against the Reds. But has the Bundesliga always been dominated by Bayern München? When did the Bayern’s hegemony begin? Let’s clarify all questions with a simple plot.
The following interactive line chart shows the evolution of the number of leagues by team. To properly visualize the results, you can deactivate some traces and only visualize those you are interested in. You can show or hide traces by clicking on their associated legend item. Additionally, you can get information about a data point (season and number of leagues) by hovering the mouse over it.