Practical work with `matplotlib` (1/2)

About those exercises

Dataset

We'll use some panel data: GDP (nominal) per year and per country. The dataset and its documentation are available here.

Objective

Apply what you have learned during this session:

Making graphs as elegant as possible

Since matplotlib enables us to make very flexible graphs, we can make them as elegant as possible. This is a good exercise to learn how to use matplotlib to its full potential.

Dataset Information

Source: World Bank GDP data
URL: https://raw.githubusercontent.com/datasets/gdp/master/data/gdp.csv
Key Columns: Country Name, Country Code, Year, Value (GDP in USD)
Time Range: 1960-2020

Important information

Some entities are not countries, like regions, income groups, etc. In some cases, you should exclude them, in some other cases they can be very useful. Here is the list of these entities.

python

non_country_entities = [
    ['Africa Eastern and Southern', 'AFE'],
    ['Africa Western and Central', 'AFW'],
    ['Arab World', 'ARB'],
    ['Caribbean small states', 'CSS'],
    ['Central Europe and the Baltics', 'CEB'],
    ['Channel Islands', 'CHI'],
    ['Early-demographic dividend', 'EAR'],
    ['East Asia &amp; Pacific', 'EAS'],
    ['East Asia &amp; Pacific (IDA &amp; IBRD countries)', 'TEA'],
    ['East Asia &amp; Pacific (excluding high income)', 'EAP'],
    ['Euro area', 'EMU'],
    ['Europe &amp; Central Asia', 'ECS'],
    ['Europe &amp; Central Asia (IDA &amp; IBRD countries)', 'TEC'],
    ['Europe &amp; Central Asia (excluding high income)', 'ECA'],
    ['European Union', 'EUU'],
    ['Fragile and conflict affected situations', 'FCS'],
    ['Heavily indebted poor countries (HIPC)', 'HPC'],
    ['High income', 'HIC'],
    ['IBRD only', 'IBD'],
    ['IDA &amp; IBRD total', 'IBT'],
    ['IDA blend', 'IDB'],
    ['IDA only', 'IDX'],
    ['IDA total', 'IDA'],
    ['Late-demographic dividend', 'LTE'],
    ['Latin America &amp; Caribbean', 'LCN'],
    ['Latin America &amp; Caribbean (excluding high income)', 'LAC'],
    ['Latin America &amp; the Caribbean (IDA &amp; IBRD countries)', 'TLA'],
    ['Least developed countries: UN classification', 'LDC'],
    ['Low &amp; middle income', 'LMY'],
    ['Low income', 'LIC'],
    ['Lower middle income', 'LMC'],
    ['Middle East &amp; North Africa', 'MEA'],
    ['Middle East &amp; North Africa (IDA &amp; IBRD countries)', 'TMN'],
    ['Middle East &amp; North Africa (excluding high income)', 'MNA'],
    ['Middle income', 'MIC'],
    ['North America', 'NAC'],
    ['OECD members', 'OED'],
    ['Other small states', 'OSS'],
    ['Pacific island small states', 'PSS'],
    ['Post-demographic dividend', 'PST'],
    ['Pre-demographic dividend', 'PRE'],
    ['Small states', 'SST'],
    ['South Asia', 'SAS'],
    ['South Asia (IDA &amp; IBRD)', 'TSA'],
    ['Sub-Saharan Africa', 'SSF'],
    ['Sub-Saharan Africa (IDA &amp; IBRD countries)', 'TSS'],
    ['Sub-Saharan Africa (excluding high income)', 'SSA'],
    ['Upper middle income', 'UMC'],
    ['World', 'WLD']
]

Setup Code (Run First)

Using `pyodide`

python

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from pyodide.http import open_url

# Load data
url = "https://raw.githubusercontent.com/datasets/gdp/master/data/gdp.csv"
df = pd.read_csv(open_url(url))

# Exclude non-country entities (regions, income groups)
non_country_entities  = {
    'AFE', 'AFW', 'ARB', 'CSS', 'CEB', 'CHI', 'EAR', 'EAS', 'TEA', 'EAP', 
    'EMU', 'ECS', 'TEC', 'ECA', 'EUU', 'FCS', 'HPC', 'HIC', 'IBD', 'IBT', 
    'IDB', 'IDX', 'IDA', 'LTE', 'LCN', 'LAC', 'TLA', 'LDC', 'LMY', 'LIC', 
    'LMC', 'MEA', 'TMN', 'MNA', 'MIC', 'NAC', 'OED', 'OSS', 'PSS', 'PST', 
    'PRE', 'SAS', 'TSA', 'SSF', 'TSS', 'SSA', 'SST', 'UMC', 'WLD'
}
df_countries = df[~df['Country Code'].isin(non_country_entities)]

df_non_countries = df[df['Country Code'].isin(non_country_entities)]

print(f"Dataset loaded: {df_countries.shape[0]} rows, {df_countries['Country Name'].nunique()} countries")

Local execution

python


import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

# Load data
df = pd.read_csv("gdp.csv")

# Exclude non-country entities (regions, income groups)
non_country_entities  = {
    'AFE', 'AFW', 'ARB', 'CSS', 'CEB', 'CHI', 'EAR', 'EAS', 'TEA', 'EAP', 
    'EMU', 'ECS', 'TEC', 'ECA', 'EUU', 'FCS', 'HPC', 'HIC', 'IBD', 'IBT', 
    'IDB', 'IDX', 'IDA', 'LTE', 'LCN', 'LAC', 'TLA', 'LDC', 'LMY', 'LIC', 
    'LMC', 'MEA', 'TMN', 'MNA', 'MIC', 'NAC', 'OED', 'OSS', 'PSS', 'PST', 
    'PRE', 'SAS', 'TSA', 'SSF', 'TSS', 'SSA', 'SST', 'UMC', 'WLD'
}
df_countries = df[~df['Country Code'].isin(non_country_entities)]

df_non_countries = df[df['Country Code'].isin(non_country_entities)]

print(f"Dataset loaded: {df_countries.shape[0]} rows, {df_countries['Country Name'].nunique()} countries")

Exercises

Exercise 1: Time Series Comparison

Task: Create a line plot comparing GDP evolution (2000-2020) for 5 economies. You will justify your choice of countries.

Requirements:

Different colors and line styles for each country
GDP values using a defensible scale
Proper legend with country names
Grid with 30% transparency
Title
1 annotation

Example:

Built with matplotlib

Exercise 2: Bar Chart

Task: Create a horizontal bar chart of the top 10 countries by 2019 GDP.

Requirements:

Use the 2019 data (more complete than 2020)
Horizontal bars with custom colors (use a colormap like 'viridis')
GDP values in trillions on x-axis
Country names on y-axis (truncate long names to 6 characters)
Grid
Title: "Top 10 Economies by GDP (2019)"
Figure size: (12, 8)

Example:

Built with matplotlib

Exercise 3: Working with ordinal data

Task: Create a stacked bar chart showing the distribution of countries across different GDP categories over time.

Context: We'll transform continuous GDP data into ordinal categories (Small, Medium, Large, Very Large economies) and visualize how the global economic landscape has evolved.

Requirements:

Use 2000, 2010, and 2019 data for comparison
Create 4 GDP categories using pd.cut() or pd.qcut():
- "Small" (< 100B dollars)
- "Medium" (100B - 500B dollars)
- "Large" (500B - 2T dollars)
- "Very Large" (> 2T dollars)
Create a stacked bar chart showing count of countries in each category by year
Use distinct colors for each category (suggest using a categorical colormap)
Add percentage labels on each stack segment
Title: "Evolution of Global Economy Distribution"
Figure size: (10, 6)
Legend positioned outside the plot area

Example:

Built with matplotlib

Exercise 4: Subplots example

Task: Create a 2x2 subplot showing different aspects of global economy.

Requirements:

Top-left: Line plot of World GDP over time (use 'WLD' code from original data)
Top-right: Pie chart of 2019 GDP share for top 8 countries + "Others"
Bottom-left: Bar chart showing GDP volatility (standard deviation 2010-2019) for top 10 economies
Bottom-right: Histogram of all countries' 2019 GDP (log scale, 30 bins)

Additional Requirements:

Figure size: (16, 12)
Consistent color schemes within each subplot
Proper titles, labels, and legends for each subplot
Main title: "Global Economy - Key Insights"
Use plt.tight_layout() for spacing

Example:

Built with matplotlib

Practical work with matplotlib (1/2)

About those exercises

Dataset

Objective

Apply what you have learned during this session:

Making graphs as elegant as possible

Dataset Information

Important information

Setup Code (Run First)

Using pyodide

Local execution

Exercises

Exercise 1: Time Series Comparison

Exercise 2: Bar Chart

Exercise 3: Working with ordinal data

Exercise 4: Subplots example

Practical work with `matplotlib` (1/2)

Using `pyodide`