Practical work with `matplotlib` (2/2) - correction

Any issue related to the proper execution of code on your machine must be solved during this session. Feel free to ask for help.

We'll use panel data: nominal GDP per year and per country. The dataset and its documentation are available here.

Objective: applying what you have learned during this session:

Since matplotlib enables us to make very flexible graphs, we can make them as elegant as possible. This is a good exercise to learn how to use matplotlib to its full potential.

Dataset Information

The dataset is the same as that used in the first practical work session: Practical work with matplotlib (1/2).

Source: World Bank GDP data
URL: https://raw.githubusercontent.com/datasets/gdp/master/data/gdp.csv
Key Columns: Country Name, Country Code, Year, Value (GDP in USD)
Time Range: 1960-2020

Important information

Some entities are not countries but rather regions, income groups, etc. In some cases, you should exclude them; in other cases, they can be very useful. Here is the list of these entities.

python

non_country_entities = [
    ['Africa Eastern and Southern', 'AFE'],
    ['Africa Western and Central', 'AFW'],
    ['Arab World', 'ARB'],
    ['Caribbean small states', 'CSS'],
    ['Central Europe and the Baltics', 'CEB'],
    ['Channel Islands', 'CHI'],
    ['Early-demographic dividend', 'EAR'],
    ['East Asia &amp; Pacific', 'EAS'],
    ['East Asia &amp; Pacific (IDA &amp; IBRD countries)', 'TEA'],
    ['East Asia &amp; Pacific (excluding high income)', 'EAP'],
    ['Euro area', 'EMU'],
    ['Europe &amp; Central Asia', 'ECS'],
    ['Europe &amp; Central Asia (IDA &amp; IBRD countries)', 'TEC'],
    ['Europe &amp; Central Asia (excluding high income)', 'ECA'],
    ['European Union', 'EUU'],
    ['Fragile and conflict affected situations', 'FCS'],
    ['Heavily indebted poor countries (HIPC)', 'HPC'],
    ['High income', 'HIC'],
    ['IBRD only', 'IBD'],
    ['IDA &amp; IBRD total', 'IBT'],
    ['IDA blend', 'IDB'],
    ['IDA only', 'IDX'],
    ['IDA total', 'IDA'],
    ['Late-demographic dividend', 'LTE'],
    ['Latin America &amp; Caribbean', 'LCN'],
    ['Latin America &amp; Caribbean (excluding high income)', 'LAC'],
    ['Latin America &amp; the Caribbean (IDA &amp; IBRD countries)', 'TLA'],
    ['Least developed countries: UN classification', 'LDC'],
    ['Low &amp; middle income', 'LMY'],
    ['Low income', 'LIC'],
    ['Lower middle income', 'LMC'],
    ['Middle East &amp; North Africa', 'MEA'],
    ['Middle East &amp; North Africa (IDA &amp; IBRD countries)', 'TMN'],
    ['Middle East &amp; North Africa (excluding high income)', 'MNA'],
    ['Middle income', 'MIC'],
    ['North America', 'NAC'],
    ['OECD members', 'OED'],
    ['Other small states', 'OSS'],
    ['Pacific island small states', 'PSS'],
    ['Post-demographic dividend', 'PST'],
    ['Pre-demographic dividend', 'PRE'],
    ['Small states', 'SST'],
    ['South Asia', 'SAS'],
    ['South Asia (IDA &amp; IBRD)', 'TSA'],
    ['Sub-Saharan Africa', 'SSF'],
    ['Sub-Saharan Africa (IDA &amp; IBRD countries)', 'TSS'],
    ['Sub-Saharan Africa (excluding high income)', 'SSA'],
    ['Upper middle income', 'UMC'],
    ['World', 'WLD']
]

Setup Code (Run First)

Using `pyodide`

python

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from pyodide.http import open_url

# Load data
url = "https://raw.githubusercontent.com/datasets/gdp/master/data/gdp.csv"
df = pd.read_csv(open_url(url))

# Exclude non-country entities (regions, income groups)
non_country_entities  = {
    'AFE', 'AFW', 'ARB', 'CSS', 'CEB', 'CHI', 'EAR', 'EAS', 'TEA', 'EAP', 
    'EMU', 'ECS', 'TEC', 'ECA', 'EUU', 'FCS', 'HPC', 'HIC', 'IBD', 'IBT', 
    'IDB', 'IDX', 'IDA', 'LTE', 'LCN', 'LAC', 'TLA', 'LDC', 'LMY', 'LIC', 
    'LMC', 'MEA', 'TMN', 'MNA', 'MIC', 'NAC', 'OED', 'OSS', 'PSS', 'PST', 
    'PRE', 'SAS', 'TSA', 'SSF', 'TSS', 'SSA', 'SST', 'UMC', 'WLD'
}
df_countries = df[~df['Country Code'].isin(non_country_entities)]

df_non_countries = df[df['Country Code'].isin(non_country_entities)]

print(f"Dataset loaded: {df_countries.shape[0]} rows, {df_countries['Country Name'].nunique()} countries")

Local execution

python


import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

# Load data
df = pd.read_csv("gdp.csv")

# Exclude non-country entities (regions, income groups)
non_country_entities  = {
    'AFE', 'AFW', 'ARB', 'CSS', 'CEB', 'CHI', 'EAR', 'EAS', 'TEA', 'EAP', 
    'EMU', 'ECS', 'TEC', 'ECA', 'EUU', 'FCS', 'HPC', 'HIC', 'IBD', 'IBT', 
    'IDB', 'IDX', 'IDA', 'LTE', 'LCN', 'LAC', 'TLA', 'LDC', 'LMY', 'LIC', 
    'LMC', 'MEA', 'TMN', 'MNA', 'MIC', 'NAC', 'OED', 'OSS', 'PSS', 'PST', 
    'PRE', 'SAS', 'TSA', 'SSF', 'TSS', 'SSA', 'SST', 'UMC', 'WLD'
}
df_countries = df[~df['Country Code'].isin(non_country_entities)]

df_non_countries = df[df['Country Code'].isin(non_country_entities)]

print(f"Dataset loaded: {df_countries.shape[0]} rows, {df_countries['Country Name'].nunique()} countries")

Exercises: Ranking

This exercise illustrates the Ranking section of the Visual Vocabulary - Financial Times Guide.

Ranking visualizations are essential for showing order and hierarchy in data. They help readers quickly identify leaders, laggards, and relative positions. In this exercise, you'll explore different ways to visualize rankings using GDP data.

Exercise 1.1: Ordered Bar Chart

Task: Create a horizontal bar chart showing the top 15 economies by GDP in 2019, ordered from highest to lowest (from bottom to top).

Requirements:

Sort countries by GDP value
Use a gradient colormap to emphasize ranking
Format GDP values in trillions
Add value labels at the end of each bar
Include gridlines for easier reading

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from pyodide.http import open_url

# Load data
url = "https://raw.githubusercontent.com/datasets/gdp/master/data/gdp.csv"
df = pd.read_csv(open_url(url))

# Exclude non-country entities (regions, income groups)
non_country_entities  = {
    'AFE', 'AFW', 'ARB', 'CSS', 'CEB', 'CHI', 'EAR', 'EAS', 'TEA', 'EAP', 
    'EMU', 'ECS', 'TEC', 'ECA', 'EUU', 'FCS', 'HPC', 'HIC', 'IBD', 'IBT', 
    'IDB', 'IDX', 'IDA', 'LTE', 'LCN', 'LAC', 'TLA', 'LDC', 'LMY', 'LIC', 
    'LMC', 'MEA', 'TMN', 'MNA', 'MIC', 'NAC', 'OED', 'OSS', 'PSS', 'PST', 
    'PRE', 'SAS', 'TSA', 'SSF', 'TSS', 'SSA', 'SST', 'UMC', 'WLD'
}
df_countries = df[~df['Country Code'].isin(non_country_entities)]

df_non_countries = df[df['Country Code'].isin(non_country_entities)]

print(f"Dataset loaded: {df_countries.shape[0]} rows, {df_countries['Country Name'].nunique()} countries")


# Filter for 2019 data and get top 15 countries
df_2019 = df_countries[df_countries['Year'] == 2019].copy()
df_2019 = df_2019.sort_values('Value', ascending=False).head(15)

# Create figure
fig, ax = plt.subplots(figsize=(12, 8))

# Create color gradient
colors = plt.cm.viridis(np.linspace(0.3, 0.9, len(df_2019)))

# Create horizontal bar chart
bars = ax.barh(range(len(df_2019)), df_2019['Value'].values / 1e12, 
               color=colors, edgecolor='white', linewidth=1)

# Customize axes with rank labels included in country names
ax.set_yticks(range(len(df_2019)))
# Create labels with rank and country name combined
labels = [f"{name} ({i+1})" for i, name in enumerate(df_2019['Country Name'].values)]
ax.set_yticklabels(labels, fontsize=11)
ax.set_xlabel('GDP (Trillions USD)', fontsize=12, fontweight='bold')
ax.set_title('Top 15 Economies by GDP (2019)', fontsize=14, fontweight='bold', pad=20)

# Add value labels
for i, (bar, value) in enumerate(zip(bars, df_2019['Value'].values)):
    ax.text(value/1e12 + 0.1, bar.get_y() + bar.get_height()/2, 
            f'${value/1e12:.1f}T', 
            va='center', fontsize=10, color='#333333')

# Style improvements
ax.grid(True, axis='x', alpha=0.3, linestyle='--')
ax.set_axisbelow(True)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['left'].set_linewidth(0.5)
ax.spines['bottom'].set_linewidth(0.5)

plt.tight_layout()
plt.show()

print("💡 Ordered bar charts are ideal for comparing values and showing clear rankings")

Exercise 1.2: Lollipop Chart

Task: Create a lollipop chart comparing GDP growth between 2010 and 2019 for the G7 countries.

Requirements:

Show both 2010 and 2019 values on the same chart
Use different markers for each year
Sort by 2019 values
Add a legend and appropriate labels

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from pyodide.http import open_url

# Load data
url = "https://raw.githubusercontent.com/datasets/gdp/master/data/gdp.csv"
df = pd.read_csv(open_url(url))

# Exclude non-country entities (regions, income groups)
non_country_entities  = {
    'AFE', 'AFW', 'ARB', 'CSS', 'CEB', 'CHI', 'EAR', 'EAS', 'TEA', 'EAP', 
    'EMU', 'ECS', 'TEC', 'ECA', 'EUU', 'FCS', 'HPC', 'HIC', 'IBD', 'IBT', 
    'IDB', 'IDX', 'IDA', 'LTE', 'LCN', 'LAC', 'TLA', 'LDC', 'LMY', 'LIC', 
    'LMC', 'MEA', 'TMN', 'MNA', 'MIC', 'NAC', 'OED', 'OSS', 'PSS', 'PST', 
    'PRE', 'SAS', 'TSA', 'SSF', 'TSS', 'SSA', 'SST', 'UMC', 'WLD'
}
df_countries = df[~df['Country Code'].isin(non_country_entities)]

df_non_countries = df[df['Country Code'].isin(non_country_entities)]

print(f"Dataset loaded: {df_countries.shape[0]} rows, {df_countries['Country Name'].nunique()} countries")


# G7 countries
g7_countries = ['United States', 'Japan', 'Germany', 'United Kingdom', 
                'France', 'Italy', 'Canada']

# Get data for G7 countries in 2010 and 2019
df_g7_2010 = df_countries[(df_countries['Country Name'].isin(g7_countries)) & 
                          (df_countries['Year'] == 2010)].copy()
df_g7_2019 = df_countries[(df_countries['Country Name'].isin(g7_countries)) & 
                          (df_countries['Year'] == 2019)].copy()

# Merge and sort by 2019 values
df_g7 = pd.merge(df_g7_2010[['Country Name', 'Value']], 
                 df_g7_2019[['Country Name', 'Value']], 
                 on='Country Name', suffixes=('_2010', '_2019'))
df_g7 = df_g7.sort_values('Value_2019', ascending=True)

# Create shorter labels for countries
country_labels = {
    'United States': 'USA',
    'United Kingdom': 'UK',
    'Japan': 'JPN',
    'Germany': 'GER',
    'France': 'FRA',
    'Italy': 'ITA',
    'Canada': 'CAN'
}
df_g7['Short_Name'] = df_g7['Country Name'].map(country_labels)

# Create figure with more horizontal space
fig, ax = plt.subplots(figsize=(14, 7))

# Plot lollipops
y_positions = np.arange(len(df_g7))

# Draw lines
for i, row in enumerate(df_g7.itertuples()):
    ax.plot([row.Value_2010/1e12, row.Value_2019/1e12], [i, i], 
            'gray', alpha=0.5, linewidth=2, zorder=1)

# Draw circles for 2010 with stronger color
ax.scatter(df_g7['Value_2010']/1e12, y_positions, 
          s=180, color='#E74C3C', edgecolor='darkred', linewidth=2, 
          zorder=2, label='2010', alpha=1.0)

# Draw circles for 2019 with stronger color
ax.scatter(df_g7['Value_2019']/1e12, y_positions, 
          s=180, color='#27AE60', edgecolor='darkgreen', linewidth=2, 
          zorder=2, label='2019', alpha=1.0)

# Customize axes with short labels
ax.set_yticks(y_positions)
ax.set_yticklabels(df_g7['Short_Name'].values, fontsize=12, fontweight='bold')
ax.set_xlabel('GDP (Trillions USD)', fontsize=12, fontweight='bold')
ax.set_title('G7 Countries: GDP Evolution 2010 vs 2019', 
            fontsize=14, fontweight='bold', pad=20)

# Add value labels with stronger colors and smart positioning
for i, row in enumerate(df_g7.itertuples()):
    # Check if values are close or if 2019 < 2010 (decrease)
    if row.Value_2019 < row.Value_2010:
        # For decreasing values, swap the label positions
        ax.text(row.Value_2019/1e12 - 0.3, i, f'${row.Value_2019/1e12:.1f}T', 
                ha='right', va='center', fontsize=10, color='#1E8449', fontweight='bold')
        ax.text(row.Value_2010/1e12 + 0.3, i, f'${row.Value_2010/1e12:.1f}T', 
                ha='left', va='center', fontsize=10, color='#C0392B', fontweight='bold')
    else:
        # Normal positioning for increasing values
        ax.text(row.Value_2010/1e12 - 0.3, i, f'${row.Value_2010/1e12:.1f}T', 
                ha='right', va='center', fontsize=10, color='#C0392B', fontweight='bold')
        ax.text(row.Value_2019/1e12 + 0.3, i, f'${row.Value_2019/1e12:.1f}T', 
                ha='left', va='center', fontsize=10, color='#1E8449', fontweight='bold')

# Style improvements
ax.grid(True, axis='x', alpha=0.3, linestyle=':')
ax.set_axisbelow(True)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.legend(loc='lower right', fontsize=11, framealpha=0.95)

# Add growth arrows
for i, row in enumerate(df_g7.itertuples()):
    growth = (row.Value_2019 - row.Value_2010) / row.Value_2010 * 100
    mid_point = (row.Value_2010/1e12 + row.Value_2019/1e12) / 2
    ax.text(mid_point, i + 0.3, f'+{growth:.0f}%', 
            ha='center', va='bottom', fontsize=8, 
            color='green' if growth > 0 else 'red', fontweight='bold')

plt.tight_layout()
plt.show()

print("📊 Lollipop charts elegantly show changes between two time points")

Exercise 1.3: Slope Chart

Task: Create a slope chart showing how the ranking of the top 10 economies changed between 2000 and 2019.

Requirements:

Show rankings (not values) on the y-axis
Connect same countries with lines
Color code lines by change direction
Label countries on both sides

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from pyodide.http import open_url

# Load data
url = "https://raw.githubusercontent.com/datasets/gdp/master/data/gdp.csv"
df = pd.read_csv(open_url(url))

# Exclude non-country entities (regions, income groups)
non_country_entities  = {
    'AFE', 'AFW', 'ARB', 'CSS', 'CEB', 'CHI', 'EAR', 'EAS', 'TEA', 'EAP', 
    'EMU', 'ECS', 'TEC', 'ECA', 'EUU', 'FCS', 'HPC', 'HIC', 'IBD', 'IBT', 
    'IDB', 'IDX', 'IDA', 'LTE', 'LCN', 'LAC', 'TLA', 'LDC', 'LMY', 'LIC', 
    'LMC', 'MEA', 'TMN', 'MNA', 'MIC', 'NAC', 'OED', 'OSS', 'PSS', 'PST', 
    'PRE', 'SAS', 'TSA', 'SSF', 'TSS', 'SSA', 'SST', 'UMC', 'WLD'
}
df_countries = df[~df['Country Code'].isin(non_country_entities)]

df_non_countries = df[df['Country Code'].isin(non_country_entities)]

print(f"Dataset loaded: {df_countries.shape[0]} rows, {df_countries['Country Name'].nunique()} countries")

# Get rankings for all countries in both years
df_2000 = df_countries[df_countries['Year'] == 2000].copy()
df_2000 = df_2000.sort_values('Value', ascending=False).reset_index(drop=True)
df_2000['Rank_2000'] = df_2000.index + 1

df_2019 = df_countries[df_countries['Year'] == 2019].copy()
df_2019 = df_2019.sort_values('Value', ascending=False).reset_index(drop=True)
df_2019['Rank_2019'] = df_2019.index + 1

# Get top 10 from each year
top10_2000 = set(df_2000.head(10)['Country Name'])
top10_2019 = set(df_2019.head(10)['Country Name'])

# Countries that were in top 10 in either year
countries_to_show = top10_2000.union(top10_2019)

# Filter and merge data for these countries
df_2000_filtered = df_2000[df_2000['Country Name'].isin(countries_to_show)]
df_2019_filtered = df_2019[df_2019['Country Name'].isin(countries_to_show)]

df_merged = pd.merge(df_2000_filtered[['Country Name', 'Rank_2000']], 
                     df_2019_filtered[['Country Name', 'Rank_2019']], 
                     on='Country Name', how='inner')

# Create figure
fig, ax = plt.subplots(figsize=(14, 10))

# Plot lines
for _, row in df_merged.iterrows():
    # Determine color based on rank change
    if row['Rank_2019'] < row['Rank_2000']:
        color = '#2ECC71'  # Green for improvement
        linewidth = 2.5
    elif row['Rank_2019'] > row['Rank_2000']:
        color = '#E74C3C'  # Red for decline
        linewidth = 2.5
    else:
        color = '#95A5A6'  # Gray for no change
        linewidth = 1.5

    # Draw line
    ax.plot([0, 1], [row['Rank_2000'], row['Rank_2019']], 
            color=color, linewidth=linewidth, alpha=0.7)

    # Add markers
    ax.scatter(0, row['Rank_2000'], s=100, color=color, zorder=3, alpha=0.9)
    ax.scatter(1, row['Rank_2019'], s=100, color=color, zorder=3, alpha=0.9)

    # Add country labels with rank always closest to axis
    ax.text(-0.05, row['Rank_2000'], f"{row['Country Name']} ({int(row['Rank_2000'])})", 
            ha='right', va='center', fontsize=10, fontweight='bold')
    ax.text(1.05, row['Rank_2019'], f"({int(row['Rank_2019'])}) {row['Country Name']}", 
            ha='left', va='center', fontsize=10, fontweight='bold')

# Customize axes - adjust y-limits to show all ranks
max_rank = max(df_merged['Rank_2000'].max(), df_merged['Rank_2019'].max())
ax.set_xlim(-0.3, 1.3)
ax.set_ylim(max_rank + 1, 0)  # Invert y-axis so rank 1 is at top
ax.set_xticks([0, 1])
ax.set_xticklabels(['2000', '2019'], fontsize=14, fontweight='bold')
ax.set_ylabel('Rank', fontsize=12, fontweight='bold')
ax.set_title('Top 10 Economies: Ranking Changes 2000-2019', 
            fontsize=16, fontweight='bold', pad=20)

# Remove spines
ax.spines['top'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['right'].set_visible(False)

# Add vertical lines at x positions
ax.axvline(0, color='black', linewidth=1, alpha=0.3)
ax.axvline(1, color='black', linewidth=1, alpha=0.3)

# Add legend
from matplotlib.lines import Line2D
legend_elements = [Line2D([0], [0], color='#2ECC71', lw=3, label='Improved ranking'),
                   Line2D([0], [0], color='#E74C3C', lw=3, label='Declined ranking'),
                   Line2D([0], [0], color='#95A5A6', lw=2, label='No change')]
ax.legend(handles=legend_elements, loc='lower center', 
         bbox_to_anchor=(0.5, -0.1), ncol=3, frameon=False)

# Remove y-axis ticks
ax.set_yticks([])

plt.tight_layout()
plt.show()

print("📈 Slope charts effectively show ranking changes over time")

Exercise 1.4: Dot Strip Plot

Task: Create a dot strip plot showing nominal GDP ranges for different regions in 2019.

Requirements:

Group by regions
Show individual countries as dots
Highlight median values

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from pyodide.http import open_url

# Load data
url = "https://raw.githubusercontent.com/datasets/gdp/master/data/gdp.csv"
df = pd.read_csv(open_url(url))

# Exclude non-country entities (regions, income groups)
non_country_entities  = {
    'AFE', 'AFW', 'ARB', 'CSS', 'CEB', 'CHI', 'EAR', 'EAS', 'TEA', 'EAP', 
    'EMU', 'ECS', 'TEC', 'ECA', 'EUU', 'FCS', 'HPC', 'HIC', 'IBD', 'IBT', 
    'IDB', 'IDX', 'IDA', 'LTE', 'LCN', 'LAC', 'TLA', 'LDC', 'LMY', 'LIC', 
    'LMC', 'MEA', 'TMN', 'MNA', 'MIC', 'NAC', 'OED', 'OSS', 'PSS', 'PST', 
    'PRE', 'SAS', 'TSA', 'SSF', 'TSS', 'SSA', 'SST', 'UMC', 'WLD'
}
df_countries = df[~df['Country Code'].isin(non_country_entities)]

df_non_countries = df[df['Country Code'].isin(non_country_entities)]

print(f"Dataset loaded: {df_countries.shape[0]} rows, {df_countries['Country Name'].nunique()} countries")


# Comprehensive regional classification for all countries
regions_map = {
    # North America
    'United States': 'North America',
    'Canada': 'North America',
    'Mexico': 'North America',
    'Bermuda': 'North America',
    'Greenland': 'North America',

    # Europe
    'Germany': 'Europe',
    'France': 'Europe',
    'United Kingdom': 'Europe',
    'Italy': 'Europe',
    'Spain': 'Europe',
    'Netherlands': 'Europe',
    'Belgium': 'Europe',
    'Switzerland': 'Europe',
    'Austria': 'Europe',
    'Sweden': 'Europe',
    'Norway': 'Europe',
    'Denmark': 'Europe',
    'Finland': 'Europe',
    'Ireland': 'Europe',
    'Portugal': 'Europe',
    'Greece': 'Europe',
    'Poland': 'Europe',
    'Czechia': 'Europe',
    'Romania': 'Europe',
    'Hungary': 'Europe',
    'Bulgaria': 'Europe',
    'Croatia': 'Europe',
    'Slovak Republic': 'Europe',
    'Slovenia': 'Europe',
    'Lithuania': 'Europe',
    'Latvia': 'Europe',
    'Estonia': 'Europe',
    'Luxembourg': 'Europe',
    'Malta': 'Europe',
    'Cyprus': 'Europe',
    'Iceland': 'Europe',
    'Albania': 'Europe',
    'Serbia': 'Europe',
    'Bosnia and Herzegovina': 'Europe',
    'North Macedonia': 'Europe',
    'Montenegro': 'Europe',
    'Kosovo': 'Europe',
    'Moldova': 'Europe',
    'Belarus': 'Europe',
    'Ukraine': 'Europe',
    'Russian Federation': 'Europe',
    'Andorra': 'Europe',
    'Monaco': 'Europe',
    'Liechtenstein': 'Europe',
    'San Marino': 'Europe',
    'Faroe Islands': 'Europe',
    'Isle of Man': 'Europe',
    'Channel Islands': 'Europe',

    # East Asia & Pacific
    'China': 'East Asia & Pacific',
    'Japan': 'East Asia & Pacific',
    'Korea, Rep.': 'East Asia & Pacific',
    'Indonesia': 'East Asia & Pacific',
    'Thailand': 'East Asia & Pacific',
    'Malaysia': 'East Asia & Pacific',
    'Singapore': 'East Asia & Pacific',
    'Philippines': 'East Asia & Pacific',
    'Viet Nam': 'East Asia & Pacific',
    'Myanmar': 'East Asia & Pacific',
    'Cambodia': 'East Asia & Pacific',
    'Lao PDR': 'East Asia & Pacific',
    'Hong Kong SAR, China': 'East Asia & Pacific',
    'Macao SAR, China': 'East Asia & Pacific',
    'Mongolia': 'East Asia & Pacific',
    'Brunei Darussalam': 'East Asia & Pacific',
    'Timor-Leste': 'East Asia & Pacific',
    'Australia': 'East Asia & Pacific',
    'New Zealand': 'East Asia & Pacific',
    'Papua New Guinea': 'East Asia & Pacific',
    'Fiji': 'East Asia & Pacific',
    'Solomon Islands': 'East Asia & Pacific',
    'Vanuatu': 'East Asia & Pacific',
    'Samoa': 'East Asia & Pacific',
    'Tonga': 'East Asia & Pacific',
    'Kiribati': 'East Asia & Pacific',
    'Palau': 'East Asia & Pacific',
    'Marshall Islands': 'East Asia & Pacific',
    'Micronesia, Fed. Sts.': 'East Asia & Pacific',
    'Nauru': 'East Asia & Pacific',
    'Tuvalu': 'East Asia & Pacific',
    'American Samoa': 'East Asia & Pacific',
    'French Polynesia': 'East Asia & Pacific',
    'Guam': 'East Asia & Pacific',
    'New Caledonia': 'East Asia & Pacific',
    'Northern Mariana Islands': 'East Asia & Pacific',

    # South Asia
    'India': 'South Asia',
    'Pakistan': 'South Asia',
    'Bangladesh': 'South Asia',
    'Sri Lanka': 'South Asia',
    'Nepal': 'South Asia',
    'Afghanistan': 'South Asia',
    'Bhutan': 'South Asia',
    'Maldives': 'South Asia',

    # Middle East & North Africa
    'Saudi Arabia': 'Middle East & North Africa',
    'United Arab Emirates': 'Middle East & North Africa',
    'Egypt, Arab Rep.': 'Middle East & North Africa',
    'Israel': 'Middle East & North Africa',
    'Iran, Islamic Rep.': 'Middle East & North Africa',
    'Iraq': 'Middle East & North Africa',
    'Algeria': 'Middle East & North Africa',
    'Morocco': 'Middle East & North Africa',
    'Kuwait': 'Middle East & North Africa',
    'Qatar': 'Middle East & North Africa',
    'Oman': 'Middle East & North Africa',
    'Lebanon': 'Middle East & North Africa',
    'Jordan': 'Middle East & North Africa',
    'Tunisia': 'Middle East & North Africa',
    'Libya': 'Middle East & North Africa',
    'Bahrain': 'Middle East & North Africa',
    'Yemen, Rep.': 'Middle East & North Africa',
    'Syrian Arab Republic': 'Middle East & North Africa',
    'West Bank and Gaza': 'Middle East & North Africa',
    'Djibouti': 'Middle East & North Africa',

    # Sub-Saharan Africa
    'Nigeria': 'Sub-Saharan Africa',
    'South Africa': 'Sub-Saharan Africa',
    'Ethiopia': 'Sub-Saharan Africa',
    'Kenya': 'Sub-Saharan Africa',
    'Ghana': 'Sub-Saharan Africa',
    'Angola': 'Sub-Saharan Africa',
    'Tanzania': 'Sub-Saharan Africa',
    'Uganda': 'Sub-Saharan Africa',
    'Zimbabwe': 'Sub-Saharan Africa',
    'Mozambique': 'Sub-Saharan Africa',
    'Zambia': 'Sub-Saharan Africa',
    'Senegal': 'Sub-Saharan Africa',
    'Mali': 'Sub-Saharan Africa',
    'Burkina Faso': 'Sub-Saharan Africa',
    'Niger': 'Sub-Saharan Africa',
    'Malawi': 'Sub-Saharan Africa',
    'Madagascar': 'Sub-Saharan Africa',
    'Cameroon': 'Sub-Saharan Africa',
    "Cote d'Ivoire": 'Sub-Saharan Africa',
    'Guinea': 'Sub-Saharan Africa',
    'Benin': 'Sub-Saharan Africa',
    'Rwanda': 'Sub-Saharan Africa',
    'Chad': 'Sub-Saharan Africa',
    'Somalia': 'Sub-Saharan Africa',
    'Burundi': 'Sub-Saharan Africa',
    'Togo': 'Sub-Saharan Africa',
    'Sierra Leone': 'Sub-Saharan Africa',
    'Liberia': 'Sub-Saharan Africa',
    'Central African Republic': 'Sub-Saharan Africa',
    'Mauritania': 'Sub-Saharan Africa',
    'Eritrea': 'Sub-Saharan Africa',
    'Gambia, The': 'Sub-Saharan Africa',
    'Botswana': 'Sub-Saharan Africa',
    'Namibia': 'Sub-Saharan Africa',
    'Gabon': 'Sub-Saharan Africa',
    'Lesotho': 'Sub-Saharan Africa',
    'Guinea-Bissau': 'Sub-Saharan Africa',
    'Equatorial Guinea': 'Sub-Saharan Africa',
    'Mauritius': 'Sub-Saharan Africa',
    'Eswatini': 'Sub-Saharan Africa',
    'Congo, Dem. Rep.': 'Sub-Saharan Africa',
    'Congo, Rep.': 'Sub-Saharan Africa',
    'Cabo Verde': 'Sub-Saharan Africa',
    'Comoros': 'Sub-Saharan Africa',
    'Sao Tome and Principe': 'Sub-Saharan Africa',
    'Seychelles': 'Sub-Saharan Africa',
    'Sudan': 'Sub-Saharan Africa',
    'South Sudan': 'Sub-Saharan Africa',

    # Latin America & Caribbean
    'Brazil': 'Latin America & Caribbean',
    'Argentina': 'Latin America & Caribbean',
    'Colombia': 'Latin America & Caribbean',
    'Chile': 'Latin America & Caribbean',
    'Peru': 'Latin America & Caribbean',
    'Venezuela, RB': 'Latin America & Caribbean',
    'Ecuador': 'Latin America & Caribbean',
    'Bolivia': 'Latin America & Caribbean',
    'Paraguay': 'Latin America & Caribbean',
    'Uruguay': 'Latin America & Caribbean',
    'Guatemala': 'Latin America & Caribbean',
    'Cuba': 'Latin America & Caribbean',
    'Dominican Republic': 'Latin America & Caribbean',
    'Haiti': 'Latin America & Caribbean',
    'Honduras': 'Latin America & Caribbean',
    'El Salvador': 'Latin America & Caribbean',
    'Nicaragua': 'Latin America & Caribbean',
    'Costa Rica': 'Latin America & Caribbean',
    'Panama': 'Latin America & Caribbean',
    'Jamaica': 'Latin America & Caribbean',
    'Trinidad and Tobago': 'Latin America & Caribbean',
    'Guyana': 'Latin America & Caribbean',
    'Suriname': 'Latin America & Caribbean',
    'Belize': 'Latin America & Caribbean',
    'Barbados': 'Latin America & Caribbean',
    'Bahamas, The': 'Latin America & Caribbean',
    'Puerto Rico': 'Latin America & Caribbean',
    'St. Lucia': 'Latin America & Caribbean',
    'Grenada': 'Latin America & Caribbean',
    'St. Vincent and the Grenadines': 'Latin America & Caribbean',
    'Antigua and Barbuda': 'Latin America & Caribbean',
    'Dominica': 'Latin America & Caribbean',
    'St. Kitts and Nevis': 'Latin America & Caribbean',
    'Cayman Islands': 'Latin America & Caribbean',
    'Aruba': 'Latin America & Caribbean',
    'Virgin Islands (U.S.)': 'Latin America & Caribbean',
    'Curacao': 'Latin America & Caribbean',
    'Sint Maarten (Dutch part)': 'Latin America & Caribbean',
    'Turks and Caicos Islands': 'Latin America & Caribbean',
    'St. Martin (French part)': 'Latin America & Caribbean',

    # Central Asia
    'Kazakhstan': 'Central Asia',
    'Uzbekistan': 'Central Asia',
    'Turkmenistan': 'Central Asia',
    'Tajikistan': 'Central Asia',
    'Kyrgyz Republic': 'Central Asia',
    'Azerbaijan': 'Central Asia',
    'Armenia': 'Central Asia',
    'Georgia': 'Central Asia',

    # Turkey (Bridge between Europe and Asia)
    'Turkiye': 'Europe'  # Often classified with Europe
}

# Filter for 2019 and get top 50 countries by GDP
df_2019 = df_countries[df_countries['Year'] == 2019].copy()
df_2019 = df_2019.sort_values('Value', ascending=False).head(50)

# Add region column
df_2019['Region'] = df_2019['Country Name'].map(regions_map)

# Remove countries without region mapping
df_2019_selected = df_2019.dropna(subset=['Region'])

# Normalize values (as proxy for per capita - simplified)
# In reality, you'd need population data
df_2019_selected['Value_normalized'] = df_2019_selected['Value'] / 1e9

# Create figure
fig, ax = plt.subplots(figsize=(14, 8))

# Get unique regions
regions = df_2019_selected['Region'].unique()
region_colors = plt.cm.Set3(np.linspace(0, 1, len(regions)))

# Plot dots for each region
y_offset = 0
y_labels = []
y_positions = []

for i, region in enumerate(regions):
    region_data = df_2019_selected[df_2019_selected['Region'] == region]

    # Sort values within region
    region_data = region_data.sort_values('Value_normalized')

    # Plot dots
    x_values = region_data['Value_normalized'].values
    y_values = [y_offset] * len(x_values)

    ax.scatter(x_values, y_values, s=120, color=region_colors[i], 
              alpha=0.7, edgecolor='white', linewidth=1.5, zorder=3)

    # Add country label for max value only
    # Find the maximum country
    max_idx = region_data['Value_normalized'].idxmax()
    max_country = region_data.loc[max_idx, 'Country Name']
    max_value = region_data.loc[max_idx, 'Value_normalized']

    # Label only the maximum country - positioned slightly to the right
    ax.text(max_value + 20, y_offset, max_country, 
           ha='left', va='center', fontsize=8, rotation=0)

    # Calculate and plot median
    median_val = np.median(x_values)
    ax.plot([median_val, median_val], [y_offset - 0.2, y_offset + 0.2], 
           color='black', linewidth=2, zorder=4)

    # Add region label
    y_labels.append(region)
    y_positions.append(y_offset)

    # Add range line
    ax.plot([x_values.min(), x_values.max()], [y_offset, y_offset], 
           color='gray', alpha=0.3, linewidth=1, zorder=1)

    y_offset += 1

# Customize axes
ax.set_yticks(y_positions)
ax.set_yticklabels(y_labels, fontsize=12, fontweight='bold')
ax.set_xlabel('GDP (Billions USD)', fontsize=12, fontweight='bold')
ax.set_title('GDP Distribution by Region (2019)', 
            fontsize=14, fontweight='bold', pad=20)

# Style improvements
ax.grid(True, axis='x', alpha=0.3, linestyle=':')
ax.set_axisbelow(True)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['left'].set_linewidth(0.5)
ax.spines['bottom'].set_linewidth(0.5)

# Add legend for median and labeled countries
from matplotlib.lines import Line2D
from matplotlib.patches import Patch
legend_elements = [
    Line2D([0], [0], color='black', lw=2, label='Median GDP'),
    Patch(facecolor='none', edgecolor='none', label='Labels show largest economy per region')
]
ax.legend(handles=legend_elements, loc='upper right', frameon=True, framealpha=0.95, 
         fontsize=10)

plt.tight_layout()
plt.show()

print("🔵 Dot strip plots show distributions and individual data points clearly")

Exercise 1.5: Bump Chart

Task: Create a bump chart showing ranking evolution of selected economies from 2010 to 2019.

Requirements:

Track ranking changes year by year
Use smooth lines to connect rankings
Apply distinct colors for each country
Show all intermediate years


import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from pyodide.http import open_url

# Load data
url = "https://raw.githubusercontent.com/datasets/gdp/master/data/gdp.csv"
df = pd.read_csv(open_url(url))

# Exclude non-country entities (regions, income groups)
non_country_entities  = {
    'AFE', 'AFW', 'ARB', 'CSS', 'CEB', 'CHI', 'EAR', 'EAS', 'TEA', 'EAP', 
    'EMU', 'ECS', 'TEC', 'ECA', 'EUU', 'FCS', 'HPC', 'HIC', 'IBD', 'IBT', 
    'IDB', 'IDX', 'IDA', 'LTE', 'LCN', 'LAC', 'TLA', 'LDC', 'LMY', 'LIC', 
    'LMC', 'MEA', 'TMN', 'MNA', 'MIC', 'NAC', 'OED', 'OSS', 'PSS', 'PST', 
    'PRE', 'SAS', 'TSA', 'SSF', 'TSS', 'SSA', 'SST', 'UMC', 'WLD'
}
df_countries = df[~df['Country Code'].isin(non_country_entities)]

df_non_countries = df[df['Country Code'].isin(non_country_entities)]

print(f"Dataset loaded: {df_countries.shape[0]} rows, {df_countries['Country Name'].nunique()} countries")

# Select countries to track
countries_to_track = ['United States', 'China', 'Japan', 'Germany', 
                     'India', 'United Kingdom', 'France', 'Brazil']

# Get data for years 2010-2019
years = range(2010, 2020)
rankings_data = {}

for year in years:
    df_year = df_countries[df_countries['Year'] == year].copy()
    df_year = df_year.sort_values('Value', ascending=False)
    df_year['Rank'] = range(1, len(df_year) + 1)

    for country in countries_to_track:
        if country not in rankings_data:
            rankings_data[country] = []

        country_rank = df_year[df_year['Country Name'] == country]['Rank'].values
        if len(country_rank) > 0:
            rankings_data[country].append(country_rank[0])
        else:
            rankings_data[country].append(None)

# Create figure
fig, ax = plt.subplots(figsize=(14, 8))

# Color palette
colors = plt.cm.tab10(np.linspace(0, 1, len(countries_to_track)))

# Plot lines for each country
for i, (country, ranks) in enumerate(rankings_data.items()):
    # Remove None values for plotting
    valid_years = [y for y, r in zip(years, ranks) if r is not None]
    valid_ranks = [r for r in ranks if r is not None]

    # Plot line with markers
    ax.plot(valid_years, valid_ranks, 'o-', color=colors[i], 
           linewidth=2.5, markersize=8, label=country, alpha=0.8)

    # Add country name at the end
    if len(valid_ranks) > 0:
        ax.text(valid_years[-1] + 0.1, valid_ranks[-1], country, 
               va='center', fontsize=10, color=colors[i], fontweight='bold')

# Customize axes
ax.set_xlim(2009.5, 2020.5)
ax.set_ylim(12, 0.5)  # Invert y-axis
ax.set_xlabel('Year', fontsize=12, fontweight='bold')
ax.set_ylabel('Global GDP Rank', fontsize=12, fontweight='bold')
ax.set_title('Economic Ranking Evolution 2010-2019', 
            fontsize=14, fontweight='bold', pad=20)

# Set x-axis ticks
ax.set_xticks(years)
ax.set_xticklabels(years, rotation=45)

# Set y-axis ticks
ax.set_yticks(range(1, 13))
ax.set_yticklabels([f'{i}' for i in range(1, 13)])

# Add grid
ax.grid(True, alpha=0.3, linestyle=':')
ax.set_axisbelow(True)

# Style improvements
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)

plt.tight_layout()
plt.show()

print("📊 Bump charts excel at showing ranking changes across multiple time periods")

Exercise 1.6: Ordered Proportional Symbol

Task: Create a proportional symbol chart showing GDP sizes with country positions based on GDP growth rate.

Requirements:

Calculate growth rate between 2010 and 2019
Filter for countries with significant GDP growth rate (top 30 in 2019)
Size circles by 2019 GDP
Position on x-axis by growth rate
Color by GDP size category

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from pyodide.http import open_url

# Load data
url = "https://raw.githubusercontent.com/datasets/gdp/master/data/gdp.csv"
df = pd.read_csv(open_url(url))

# Exclude non-country entities (regions, income groups)
non_country_entities  = {
    'AFE', 'AFW', 'ARB', 'CSS', 'CEB', 'CHI', 'EAR', 'EAS', 'TEA', 'EAP', 
    'EMU', 'ECS', 'TEC', 'ECA', 'EUU', 'FCS', 'HPC', 'HIC', 'IBD', 'IBT', 
    'IDB', 'IDX', 'IDA', 'LTE', 'LCN', 'LAC', 'TLA', 'LDC', 'LMY', 'LIC', 
    'LMC', 'MEA', 'TMN', 'MNA', 'MIC', 'NAC', 'OED', 'OSS', 'PSS', 'PST', 
    'PRE', 'SAS', 'TSA', 'SSF', 'TSS', 'SSA', 'SST', 'UMC', 'WLD'
}
df_countries = df[~df['Country Code'].isin(non_country_entities)]

df_non_countries = df[df['Country Code'].isin(non_country_entities)]

print(f"Dataset loaded: {df_countries.shape[0]} rows, {df_countries['Country Name'].nunique()} countries")



# Get data for 2010 and 2019
df_2010 = df_countries[df_countries['Year'] == 2010][['Country Name', 'Value']]
df_2019 = df_countries[df_countries['Year'] == 2019][['Country Name', 'Value']]

# Merge and calculate growth
df_growth = pd.merge(df_2010, df_2019, on='Country Name', suffixes=('_2010', '_2019'))
df_growth['Growth_Rate'] = ((df_growth['Value_2019'] - df_growth['Value_2010']) / 
                            df_growth['Value_2010'] * 100)

# Filter for countries with significant GDP (top 30 in 2019)
df_growth = df_growth.nlargest(30, 'Value_2019')

# Create figure
fig, ax = plt.subplots(figsize=(14, 8))

# Normalize bubble sizes
max_gdp = df_growth['Value_2019'].max()
sizes = (df_growth['Value_2019'] / max_gdp * 3000) + 100

# Create color categories
df_growth['Size_Category'] = pd.cut(df_growth['Value_2019'], 
                                     bins=[0, 1e12, 5e12, 10e12, float('inf')],
                                     labels=['< $1T', '$1-5T', '$5-10T', '> $10T'])

# Color map
color_map = {'< $1T': '#3498DB', '$1-5T': '#2ECC71', 
             '$5-10T': '#F39C12', '> $10T': '#E74C3C'}
colors = [color_map[cat] for cat in df_growth['Size_Category']]

# Plot bubbles
scatter = ax.scatter(df_growth['Growth_Rate'], 
                    range(len(df_growth)), 
                    s=sizes, c=colors, alpha=0.6, 
                    edgecolor='white', linewidth=2)

# Add country labels
for i, row in enumerate(df_growth.itertuples()):
    # Only label large economies or high growth
    if row.Value_2019 > 2e12 or row.Growth_Rate > 200:
        # Note: itertuples() uses Index attribute for 'Country Name' column
        country_name = df_growth.iloc[i]['Country Name']
        ax.text(row.Growth_Rate, i, country_name, 
               fontsize=9, ha='center', va='center')

# Customize axes
ax.set_xlabel('GDP Growth Rate 2010-2019 (%)', fontsize=12, fontweight='bold')
ax.set_ylabel('Countries (ordered by 2019 GDP)', fontsize=12, fontweight='bold')
ax.set_title('GDP Size and Growth Rate (2010-2019)', 
            fontsize=14, fontweight='bold', pad=20)

# Remove y-axis labels
ax.set_yticks([])

# Add vertical line at 0% growth
ax.axvline(0, color='gray', linestyle='--', alpha=0.5)

# Add vertical line at 100% growth
ax.axvline(100, color='green', linestyle=':', alpha=0.3)
ax.text(100, ax.get_ylim()[1] * 0.95, 'Doubled GDP', 
       ha='center', fontsize=9, color='green')

# Grid
ax.grid(True, axis='x', alpha=0.3, linestyle=':')
ax.set_axisbelow(True)

# Style
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['left'].set_visible(False)

# Legend
from matplotlib.patches import Circle
legend_elements = [Circle((0, 0), 1, fc=color, alpha=0.6, 
                         edgecolor='white', linewidth=2, 
                         label=label) 
                  for label, color in color_map.items()]

legend1 = ax.legend(handles=legend_elements, loc='upper left', 
                   title='GDP Size (2019)', framealpha=0.95)

# Size legend
size_legend_elements = [
    Circle((0, 0), np.sqrt(500/np.pi), fc='gray', alpha=0.3, label='Small'),
    Circle((0, 0), np.sqrt(1500/np.pi), fc='gray', alpha=0.3, label='Medium'),
    Circle((0, 0), np.sqrt(3000/np.pi), fc='gray', alpha=0.3, label='Large')
]
ax.add_artist(legend1)  # Add first legend back
ax.legend(handles=size_legend_elements, loc='upper right', 
         title='Relative GDP', framealpha=0.95)

plt.tight_layout()
plt.show()

print("⭕ Proportional symbols effectively show multiple dimensions of ranking data")

Practical work with matplotlib (2/2) - correction

Objective: applying what you have learned during this session:

Dataset Information

Important information

Setup Code (Run First)

Using pyodide

Local execution

Exercises: Ranking

Exercise 1.1: Ordered Bar Chart

Exercise 1.2: Lollipop Chart

Exercise 1.3: Slope Chart

Exercise 1.4: Dot Strip Plot

Exercise 1.5: Bump Chart

Exercise 1.6: Ordered Proportional Symbol

Practical work with `matplotlib` (2/2) - correction

Using `pyodide`