Practical work with matplotlib
(2/2)
Any issue related to the proper execution of code on your machine must be solved during this session. Feel free to ask for help.
We'll use panel data: nominal GDP per year and per country. The dataset and its documentation are available here.
Objective: applying what you have learned during this session:
Since matplotlib
enables us to make very flexible graphs, we can make them as elegant as possible. This is a good exercise to learn how to use matplotlib
to its full potential.
Dataset Information
The dataset is the same as that used in the first practical work session: Practical work with matplotlib
(1/2).
- Source: World Bank GDP data
- URL:
https://raw.githubusercontent.com/datasets/gdp/master/data/gdp.csv
- Key Columns:
Country Name
,Country Code
,Year
,Value
(GDP in USD) - Time Range: 1960-2020
Important information
Some entities are not countries but rather regions, income groups, etc. In some cases, you should exclude them; in other cases, they can be very useful. Here is the list of these entities.
non_country_entities = [
['Africa Eastern and Southern', 'AFE'],
['Africa Western and Central', 'AFW'],
['Arab World', 'ARB'],
['Caribbean small states', 'CSS'],
['Central Europe and the Baltics', 'CEB'],
['Channel Islands', 'CHI'],
['Early-demographic dividend', 'EAR'],
['East Asia & Pacific', 'EAS'],
['East Asia & Pacific (IDA & IBRD countries)', 'TEA'],
['East Asia & Pacific (excluding high income)', 'EAP'],
['Euro area', 'EMU'],
['Europe & Central Asia', 'ECS'],
['Europe & Central Asia (IDA & IBRD countries)', 'TEC'],
['Europe & Central Asia (excluding high income)', 'ECA'],
['European Union', 'EUU'],
['Fragile and conflict affected situations', 'FCS'],
['Heavily indebted poor countries (HIPC)', 'HPC'],
['High income', 'HIC'],
['IBRD only', 'IBD'],
['IDA & IBRD total', 'IBT'],
['IDA blend', 'IDB'],
['IDA only', 'IDX'],
['IDA total', 'IDA'],
['Late-demographic dividend', 'LTE'],
['Latin America & Caribbean', 'LCN'],
['Latin America & Caribbean (excluding high income)', 'LAC'],
['Latin America & the Caribbean (IDA & IBRD countries)', 'TLA'],
['Least developed countries: UN classification', 'LDC'],
['Low & middle income', 'LMY'],
['Low income', 'LIC'],
['Lower middle income', 'LMC'],
['Middle East & North Africa', 'MEA'],
['Middle East & North Africa (IDA & IBRD countries)', 'TMN'],
['Middle East & North Africa (excluding high income)', 'MNA'],
['Middle income', 'MIC'],
['North America', 'NAC'],
['OECD members', 'OED'],
['Other small states', 'OSS'],
['Pacific island small states', 'PSS'],
['Post-demographic dividend', 'PST'],
['Pre-demographic dividend', 'PRE'],
['Small states', 'SST'],
['South Asia', 'SAS'],
['South Asia (IDA & IBRD)', 'TSA'],
['Sub-Saharan Africa', 'SSF'],
['Sub-Saharan Africa (IDA & IBRD countries)', 'TSS'],
['Sub-Saharan Africa (excluding high income)', 'SSA'],
['Upper middle income', 'UMC'],
['World', 'WLD']
]
Setup Code (Run First)
Using pyodide
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from pyodide.http import open_url
# Load data
url = "https://raw.githubusercontent.com/datasets/gdp/master/data/gdp.csv"
df = pd.read_csv(open_url(url))
# Exclude non-country entities (regions, income groups)
non_country_entities = {
'AFE', 'AFW', 'ARB', 'CSS', 'CEB', 'CHI', 'EAR', 'EAS', 'TEA', 'EAP',
'EMU', 'ECS', 'TEC', 'ECA', 'EUU', 'FCS', 'HPC', 'HIC', 'IBD', 'IBT',
'IDB', 'IDX', 'IDA', 'LTE', 'LCN', 'LAC', 'TLA', 'LDC', 'LMY', 'LIC',
'LMC', 'MEA', 'TMN', 'MNA', 'MIC', 'NAC', 'OED', 'OSS', 'PSS', 'PST',
'PRE', 'SAS', 'TSA', 'SSF', 'TSS', 'SSA', 'SST', 'UMC', 'WLD'
}
df_countries = df[~df['Country Code'].isin(non_country_entities)]
df_non_countries = df[df['Country Code'].isin(non_country_entities)]
print(f"Dataset loaded: {df_countries.shape[0]} rows, {df_countries['Country Name'].nunique()} countries")
Local execution
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
# Load data
df = pd.read_csv("gdp.csv")
# Exclude non-country entities (regions, income groups)
non_country_entities = {
'AFE', 'AFW', 'ARB', 'CSS', 'CEB', 'CHI', 'EAR', 'EAS', 'TEA', 'EAP',
'EMU', 'ECS', 'TEC', 'ECA', 'EUU', 'FCS', 'HPC', 'HIC', 'IBD', 'IBT',
'IDB', 'IDX', 'IDA', 'LTE', 'LCN', 'LAC', 'TLA', 'LDC', 'LMY', 'LIC',
'LMC', 'MEA', 'TMN', 'MNA', 'MIC', 'NAC', 'OED', 'OSS', 'PSS', 'PST',
'PRE', 'SAS', 'TSA', 'SSF', 'TSS', 'SSA', 'SST', 'UMC', 'WLD'
}
df_countries = df[~df['Country Code'].isin(non_country_entities)]
df_non_countries = df[df['Country Code'].isin(non_country_entities)]
print(f"Dataset loaded: {df_countries.shape[0]} rows, {df_countries['Country Name'].nunique()} countries")
Exercises: Ranking
The exercises illustrate the Ranking section of the Visual Vocabulary - Financial Times Guide.
Ranking visualizations are essential for showing order and hierarchy in data. They help readers quickly identify leaders, laggards, and relative positions. In this exercise, you'll explore different ways to visualize rankings using GDP data.
🏗️ Code for data preprocessing, comments proposing steps to follow, and commented code giving clues have been provided for you in the snippets below.
Exercise 1.1: Ordered Bar Chart
Task: Create a horizontal bar chart showing the top 15 economies by GDP in 2019, ordered from highest to lowest (from bottom to top).
Requirements:
- Sort countries by GDP value
- Use a gradient colormap to emphasize ranking
- Format GDP values in trillions
- Add value labels at the end of each bar
- Include gridlines for easier reading
Exercise 1.2: Lollipop Chart
Task: Create a lollipop chart comparing GDP growth between 2010 and 2019 for the G7 countries.
Requirements:
- Show both 2010 and 2019 values on the same chart
- Use different markers for each year
- Sort by 2019 values
- Add a legend and appropriate labels
Exercise 1.3: Slope Chart
Task: Create a slope chart showing how the ranking of the top 10 economies changed between 2000 and 2019.
Requirements:
- Show rankings (not values) on the y-axis
- Connect same countries with lines
- Color code lines by change direction
- Label countries on both sides
Exercise 1.4: Dot Strip Plot
Task: Create a dot strip plot showing nominal GDP ranges for different regions in 2019.
Requirements:
- Group by regions
- Show individual countries as dots
- Highlight median values
Exercise 1.5: Bump Chart
Task: Create a bump chart showing ranking evolution of selected economies from 2010 to 2019.
Requirements:
- Track ranking changes year by year
- Use smooth lines to connect rankings
- Apply distinct colors for each country
- Show all intermediate years
Exercise 1.6: Ordered Proportional Symbol
Task: Create a proportional symbol chart showing GDP sizes with country positions based on GDP growth rate.
Requirements:
- Calculate growth rate between 2010 and 2019
- Filter for countries with significant GDP growth rate (top 30 in 2019)
- Size circles by 2019 GDP
- Position on x-axis by growth rate
- Color by GDP size category