Graphic Semiology Fundamentals

Introduction to Graphic Semiology

Graphic semiology studies how visual elements act as a language to encode, communicate, and process information. Pioneered by Jacques Bertin, it formalizes the grammar of graphics: how visual variables (symbols, signs, marks) represent data in diagrams, maps, and networks. Bertin’s thesis is that graphics constitute a system with its own syntax and semantics, much as in verbal language.

Sémiologie graphique by Jacques Bertin, EHESS 2013 reprint Source: Wikipedia | License: CC BY 4.0
Sémiologie graphique by Jacques Bertin, EHESS 2013 reprint Source: Wikipedia | License: CC BY 4.0

1. Theoretical Foundations

1.1. Graphics as Language

  • Graphics are a visual code, and each constructed image is a message.
  • Visual elements (the signifiers) correspond to information components (the signified).
  • The goal is to achieve “visual unity,” maximizing both efficiency and global apprehension of a message.

1.2. Historical Evolution

  • Bertin's laboratory, the Laboratoire de Graphique, crystallized these principles through interdisciplinary collaboration, handling demands across social sciences, geography, and cartography.
  • The 1967 “Sémiologie graphique” was a paradigm shift, bringing a rational, systematic language for visual analytics.

1.3. An example from a research paper

This graph comes from the research paper: Different languages, similar encoding efficiency: Comparable information rates across the human communicative niche by Christophe Coupé, Yoon Mi Oh, Dan Dediu, and François Pellegrino.
Source: Science.org

1.4. [Optional] Information Rate (IR) Definition

  • In the graph below, "SR" stands for Speech Rate and "IR" stands for Information Rate.
  • Based on Shannon's information theory in the study:
    • Information Density ($ID$): Average bits of information per syllable (conditional entropy, accounting for syllable dependencies).
    • Speech Rate ($SR$): Syllables spoken per second.
  • $IR = ID \times SR$ (bits/second).
  • Across $17$ languages, $IR$ averages $\text{\textasciitilde} 39$ bits/s, showing languages encode info at similar rates despite differences in $ID$ or $SR$.
Different languages, similar encoding efficiency: Comparable information rates across the human communicative niche - by Christophe Coupé, Yoon Mi Oh, Dan Dediu, and François Pellegrino. Source: Science.org
Different languages, similar encoding efficiency: Comparable information rates across the human communicative niche - by Christophe Coupé, Yoon Mi Oh, Dan Dediu, and François Pellegrino. Source: Science.org

More detailed explanations are available here: Computing Shannon Entropy for Information Density.

1.4. An adaptation from The Economist

This graph comes from the article: Why are some languages spoken faster than others? by The Economist (Sep 28th 2019). Link
Why are some languages spoken faster than others? by The Economist - Sep 28th 2019
Why are some languages spoken faster than others? by The Economist - Sep 28th 2019

1.5. Discussion

Looking at the two above graphs:

  • What are the signifiers ?
  • What are the signified ?
  • What can we conclude from those graphs ?
  • How do they compare in terms of visual unity?

2. Structure of Information

2.1. Theoretical Foundations

  • Invariant (Theme/Essence): The unifying core of the data (e.g., a common subject or theme linking all data points).
  • Components (Variables/Items): Distinct fields or characteristics in the dataset (nominal, ordinal, quantitative).
  • Components can relate differentially (qualitative categories), ordinally (rank/order), or quantitatively (measured scales).
  • Graphic System: Theoretical Framework
    • The cartographic/graphic “plane” is the two-dimensional space in which elements are organized.
    • It translates “plane geometry” into meaningful arrangements: quantities, categories, relationships, or spatial representations.
  • Classes of Representation
    • Diagrams: Abstract data structures (e.g., time series, bar charts).
    • Networks: Relationships among entities (e.g., graphs, flow diagrams).
    • Maps: Spatial embedding of data (e.g., geographic maps).

2.2 Datasets used in the following examples

2.3. An example with a bar chart

2.4. 💬 Discussion

Looking at the above graph:

  • What are the signifiers ?
  • What are the signified ?
  • What can we conclude from those graphs ?
  • How do they compare in terms of readability and visual unity?

3. Relationships

3.1. Relations and dimensionality

  • Key in analysis: how components relate, not necessarily their individual meanings, but their differences and connections.
  • For every dataset, determining its dimensionality (number/nature of components) directs the choice of graphic strategy.

3.2. Datasets used in the following examples

100m men world record progression
DateTime (s)Athlete Country
1912-07-0610.6🇺🇸
1921-04-2310.4🇺🇸
1930-08-0910.3🇨🇦
1936-06-2010.2🇺🇸
1956-08-0310.1🇺🇸
1960-06-2110.0🇩🇪
1968-06-209.95🇺🇸
1983-07-039.93🇺🇸
1988-09-249.92🇺🇸
1991-06-149.90🇺🇸
1991-08-259.86🇺🇸
1994-07-069.85🇺🇸
1996-07-279.84🇨🇦
1999-06-169.79🇺🇸
2005-06-149.77🇯🇲
2008-05-319.72🇯🇲
2008-08-169.69🇯🇲
2009-08-169.58🇯🇲

Source: Wikipedia / World Athletics (as of September 2025)

North countries national development profiles
CountryGDP per CapitaLife ExpectancyEducation IndexHappiness ScoreInnovation IndexEnvironmental Score
Norway 🇳🇴758295766878
Switzerland 🇨🇭818488756781
Denmark 🇩🇰608192785878
Iceland 🇮🇸528395755568
Netherlands 🇳🇱538293745876
Sweden 🇸🇪518394736378
Germany 🇩🇪468193708777
Canada 🇨🇦468292726172
Australia 🇦🇺558392734660
Japan 🇯🇵398585595465
South Korea 🇰🇷318385586463
United Kingdom 🇬🇧428190705958
France 🇫🇷408388665480
Belgium 🇧🇪478289695073
Austria 🇦🇹488191714579

Source: United Nations Development Programme, World Bank, OECD, WIPO, and World Happiness Report, compiled and normalized (2024–2025)

3.3. An example with a scatter plot

Here we add a regression line to the scatter plot to show the trend of the data. It show how to combine lines and points in the same plot.

Look carefully at the axis: the $y$ axis is inverted, because the faster times are better. This is a common practice in data visualization, it's not mandatory, but it helps to make the plot more readable.

Since not only has the 100m men world record, but also most world records (whatever the sport), been improved continuously during the last century, what can we conclude about the evolution in the field of athletics?

3.4. An example with a heatmap

This example demonstrates how multiple quantitative components relate across different entities. The heatmap reveals patterns, clusters, and trade-offs that emerge from multi-dimensional data without incorrectly connecting nominal categories.

The y-axis is sorted by GDP per capita, descending.

3.5. Another example with a heatmap

3.6. Alternative: Clustered scatter matrix

3.7. Discussion

Looking at the above graph:

  • What are the signifiers ?
  • What are the signified ?
  • What can we conclude from those graphs ?
  • How do they compare in terms of readability and visual unity?
  • Big findings? Are we this sure ? (For some heatmap values from 3.4, once the economical context of those countries is known, the analysis is pretty clear. But for some others, it's not so obvious.)

4. Visual Variables

This part of the lecture is available here: Visual Variables.

5. Graphic Rules and Grammar

5.1. Problem Construction

  • Any graphic is a solution to multiple possible ways to encode the same dataset—the “graphic problem” is to choose the most efficient/legible.
  • Design must account for perceptual tasks (lookup, comparison, grouping, pattern search).

5.2. Grammar of Construction

  • Density: Avoid excessive clutter, but enough detail for insight.
  • Retinal Legibility: Variables should combine without creating confusion.
  • Layering/Separation: Different elements must remain perceptually separable (via color, value, or shape).
Maths.pm  ne collecte aucune donnée.
  • Aucun cookie collecté
  • Aucune ligne de log écrite
  • Pas l'ombre d'une base de données distante
  • nihil omnino

  • Ni par pointcarre.app
  • Ni par notre hébergeur
  • Ni par aucun service tiers

Nous expliquons notre démarche zéro donnée conservée sur cette page.

Maths.pm, par

pointcarre.app

Codes sources
Logo licence AGPLv3
Contenus
Logo licence Creative Commons