7. Text and annotations

Author

Camille Seaberry

Modified

March 6, 2024

Big picture: providing context and making meaning

“Until the systems of power recognise different categories, the data I’m reporting on is also flawed,” she added.

In a bid to account for these biases, and any biases of her own, Chalabi is transparent about her sources and often includes disclaimers about her own decision-making process and about any gaps or uncertainties in the data.

“I try to produce journalism where I’m explaining my methods to you,” she said. “If I can do this, you can do this, too. And it’s a very democratising experience, it’s very egalitarian.”

In an ideal scenario, she is able to integrate this background information into the illustrations themselves, as evidenced by her graphics on anti-Asian hate crimes and the ethnic cleansing of Uygurs in China.

But at other times, context is relegated to the caption to ensure the graphic is as grabby as possible.

“What I have found is literally every single word that you add to an image reduces engagement, reduces people’s willingness or ability to absorb the information,” Chalabi said.

“So there is a tension there. How can you be accurate and get it right without alienating people by putting up too much information? That’s a really, really hard balance.”

Mona Chalabi in Hahn (2023)

Hahn, J. (2023). "Data replicates the existing systems of power" says Pulitzer Prize-winner Mona Chalabi. Dezeen. https://www.dezeen.com/2023/11/16/mona-chalabi-pulitzer-prize-winner/

Text

A data visualization is not a piece of art meant to be looked at only for its aesthetically pleasing features. Instead, its purpose is to convey information and make a point. To reliably achieve this goal when preparing visualizations, we have to place the data into context and provide accompanying titles, captions, and other annotations. – Wilke (2019) ch. 22

Wilke, C. (2019). Fundamentals of data visualization: A primer on making informative and compelling figures (First edition). O’Reilly. https://clauswilke.com/dataviz/

The type of text you use, phrasing, and placement all depend on where your visualizations will go, who will read them, and how they might be distributed. For example, I might put less detail in the titles and labels of a chart that will be part of a larger publication than a chart that might get distributed on its own (I’ll also tend towards more straightforward chart types and simpler analyses for something standalone).

Uses of text

Here’s a good rundown on how to use text

library(dplyr)
library(ggplot2)
library(justviz)
source(here::here("utils/plotting_utils.R"))
# source(here::here("utils/misc.R"))
balt_metro <- readRDS(here::here("utils/balt_metro.rds"))

# set a default theme from the one I defined in plotting_utils.R
theme_set(theme_nice())

Identify all the text in this chart, what purpose it serves, and whether that could be done better through other means.

acs |>
  filter(level %in% c("us", "state") | name %in% balt_metro) |>
  mutate(name = forcats::fct_reorder(name, total_cost_burden)) |>
  mutate(level2 = forcats::fct_other(name, keep = c("United States", "Maryland", "Baltimore city"))) |>
  stylehaven::offset_lbls(value = total_cost_burden, frac = 0.025, fun = scales::label_percent()) |>
  ggplot(aes(x = name, y = total_cost_burden, fill = level2)) +
  geom_col(width = 0.8) +
  geom_text(aes(label = lbl, hjust = just, y = y), color = "white", fontface = "bold") +
  scale_y_barcontinuous() +
  coord_flip() +
  labs(title = "Baltimore city has a higher rate of cost burden than the state or nation",
       subtitle = "Share of households that are cost burdened, Maryland, 2022",
       caption = "Source: US Census Bureau American Community Survey, 2022 5-year estimates",
       fill = "fill") +
  theme(panel.grid.major.y = element_blank(),
        panel.grid.major.x = element_line())

Brainstorm

Text	Purpose	Could be better?
Title	Takeaway, what you’re looking at in context
Subtitle	Specifics of what’s being measured	Depending on context, maybe put cost burden definition here
Independent axis	Locations
Independent axis title	What’s on the axis	Not necessary; we know what these names are
Legend title	What colors mean
Legend labels	Location types	Drop the legend, put any additional info in subtitle
Dependent axis title	Meaning of variable being measured	Can remove since it’s in the subtitle, but some styleguides may say keep it
Caption	Source	Could put definition of cost burden here
Dependent axis labels	Specify meaning of breaks along axis	Can drop because redundant
Direct data labels on bars	Values of each data point

Other annotations

There are other annotations that are useful too. You might mark off a region to show a cluster of points, or a period in time. There are 2 approaches to this with ggplot: using geoms (geom_text, geom_hline, etc) or annotation layers (annotate, annotation_custom). The main difference is that annotations aren’t mapped to data the way geoms are. Because of that, I almost only use geoms for annotations, and usually make a small data frame just for the data that goes into the annotations to avoid hard-coding too much.

An example from DataHaven’s most recent books: we wanted to explicitly put evictions into a policy context, so we marked off the end of the federal eviction moratorium and the prepandemic average count as a threshhold. Without those labeled lines, you could tell that there was an abrupt drop in evictions, then a steep rise in them about a year and a half later, then counts that are higher than at the beginning of 2020. But unless you had followed eviction trends and COVID relief policies, you might not know why any of those things occurred.

Exercises

This chart doesn’t have labels for its axes, but you know it’s unemployment rates in Baltimore and Maryland. How accurately can we guess what the labels would be?

Next, what annotations would be helpful for contextualizing this trend?

Brainstorm: contextualizing information

Timespan–years on axis
Source
Units of measurement
Historical events