Note for following along

Topics

Here we look at ways to compare clubs’ performances across seasons using a variety of SQL techniques and we once again explot DuckDB’s macro implementation.

Basic statistics for clubs and seasons

We have all the tables and views we need to perform some comparisons between clubs over the seasons.

We will start answering these questions:

  • Which 10 clubs have collected the most points over all seasons?
  • Which 10 clubs have collected the highest average points per game over all seasons that they have played in the EPL?
  • Which season produced the highest/lowest goals total?

Answering these types of questions will develop our analytical skills using SQL. The view vw_ltables contains all the data we need to answer these questions. Try answering them yourself and consult the solutions if needed. None of the SQL for this blog post is diffiecult.

Which club has collected the most points over all seasons?

SELECT
  ccode,
  SUM(points) points_total
FROM vw_ltables
GROUP BY ccode
ORDER BY points_total DESC
LIMIT 10;

This produces the following 10 rows:

ccode points_total
MUN 2518
ARS 2329
LIV 2233
CHE 2215
TOT 1902
MCI 1805
EVE 1655
NEW 1563
AVL 1503
WHU 1334

Join to clubs and use the club name if you prefer this to the club code. The query output is not surprising because 6 of those clubs have been in the EPL for all seasons in our dataset and Manchester United have won the most league titles so it is hardly surprising that they have earned the most points over all the seasons. You might expect Manchester City (MCI) to be higher, but they have not played in all EPL season. Let’s see how they fare when we calculate average points per game played in the next solution.

Which club has collected the highest average points per game over all seasons that it has played in the EPL?

We suspect that because Manchester City have played in fewer EPL seasons that the other top 10 that we are underestimating their record. If we correct for the number of games played then perhaps we will see a different outcome.

SELECT
  ccode,
  ROUND(SUM(points)/SUM(played), 2) points_per_game
FROM vw_ltables
GROUP BY ccode
ORDER BY points_per_game DESC
LIMIT 10;

And we get:

ccode points_per_game
MUN 2.05
ARS 1.9
LIV 1.82
CHE 1.8
MCI 1.74
TOT 1.55
NEW 1.41
LEE 1.4
BLB 1.39
BRE 1.38

Actually, not much change!

Which five seasons produced the highest/lowest goals total?

SELECT
  season,
  SUM(scored) season_total_goals
FROM  vw_ltables
GROUP BY season
ORDER BY season_total_goals DESC
LIMIT 5;
season season_total_goals
2023_2024 1246
1992_1993 1222
1994_1995 1195
1993_1994 1195
2021_2022 1084

A point worth noting here is that in those seasons from the early to mid 1990s, 460 games were played per season rather than the 380 that are played now. This is undoubtedly skewing our results. You need to know your data well when making such comparisons.

To find the five seasons producing the fewest goals we just need to switch the ORDER BY from DESC to ASC (the default).

SELECT
  season,
  SUM(scored) season_total_goals
FROM  vw_ltables
GROUP BY season
ORDER BY season_total_goals
LIMIT 5;

Here is the output:

season season_total_goals
2006_2007 931
2008_2009 942
2005_2006 944
1998_1999 959
1996_1997 970

Comparing clubs across season

We would like to generate a table of a club’s EPL position at the end of each season. To do this we will write the following macro:

CREATE OR REPLACE MACRO macros.club_league_positions(p_club) AS TABLE
SELECT
  season,
  league_position
FROM main.vw_ltables
WHERE ccode = p_club
ORDER BY league_position;

Here is an example of how we can call our macro to get Tottenham Hotspur’s final league positions for each season:

SELECT * 
FROM macros.league_positions_club('TOT');

As in every league, there are hotly contested local rivalries in the EPL. One suck rivalry is that between North London neighbours Arsenal and Tottenham so let’s generate a table comparing the two clubs’s finishing positions over the seasons.

Note how we sort by season ascending by extracting the first year of the season and converting it to an integer

SELECT 
  ars.season,
  ars.league_position ars_pos,
  tot.league_position tot_pos
FROM macros.league_positions_club('ARS') ars
JOIN (SELECT * FROM macros.league_positions_club('TOT')) tot
  ON ars.season = tot.season
ORDER BY CAST(SUBSTR(ars.season, 1, 4) AS INTEGER);

Here is the output for the first 10 seasons

season ars_pos tot_pos
1992_1993 10 8
1993_1994 4 15
1994_1995 12 7
1995_1996 4 8
1996_1997 3 10
1997_1998 1 15
1998_1999 2 11
1999_2000 2 10
2000_2001 2 12
2001_2002 1 9

Next up

We’ll look at date and time functions in duckDB.


<
Previous Post
Running totals with a window function 🦆⚽
>
Blog Archive
Archive of all previous blog posts