#SWDChallenge March 2019

This month I decided to take part in the Storytelling With Data monthly challenge for the first time! The dataset we were given to explore contains global aid exchanges between 47 countries across the world across the years 1973-2013. The goal is to create visualizations that answers the broad question: Who Donates?, as well as some bonus questions about distribution of donations geographically, temporally and by purpose of donation. Here’s my initial attempt! Along with some code. (The only package I use here is tidyverse).

The Data

The data provided is nice and clean, so all we are left to do is read it in using read_csv(). I changed some variable names to make it nicer to work with, and noticed that there are a few negative quantities of money in the data, which I drop since they are impossible. Here’s a glimpse of what the data looks like:

id year donor recipient amount purpose_code purpose_desc
2414478 1977 Saudi Arabia India 348718518 23030 Power generation/renewable sources
2414509 1977 Saudi Arabia Brazil 191647004 23040 Electrical transmission/ distribution
2414635 1983 Saudi Arabia India 79371799 21030 Rail transport
2414665 1984 Saudi Arabia Taiwan 212202942 21030 Rail transport
2414667 1984 Saudi Arabia Korea 134511154 21040 Water transport
2414684 1985 Saudi Arabia India 128074768 23000 Energy generation and supply, combinations of activities

Who Donates?

One of the challenges in answering this question is how to summarize across time. I chose to look at the proportion of the total money contributed to global aid that each country contributed and received.

donated <-
  data %>% 
  group_by(donor) %>% 
  summarise(donated = sum(amount)) %>% 
  mutate(prop_donated = donated / sum(donated)) %>% 
  select(country = donor, prop_donated)
## `summarise()` ungrouping output (override with `.groups` argument)
received <-
  data %>% 
  group_by(recipient) %>% 
  summarise(received = sum(amount)) %>% 
  mutate(prop_received = received / sum(received)) %>% 
  select(country = recipient, prop_received)
## `summarise()` ungrouping output (override with `.groups` argument)
aid <-
  donated %>% 
  full_join(received, by = c("country")) %>% 
  mutate_at(vars(-country), ~if_else(is.na(.), 0, .)) %>% 
  gather(-country, key = direction, value = proportion_of_aid) %>% 
  mutate(direction = str_extract(direction, "[^_]+$"))

country_order <-
  aid %>% 
  spread(direction, proportion_of_aid) %>% 
  mutate(diff = donated - received) %>% 
  arrange(diff) %>% 
  pull(country)

aid <-
  aid %>% 
   mutate(country = factor(country, levels = country_order, ordered = TRUE))

segments <-
  aid %>% 
  spread(direction, proportion_of_aid)

aid %>% 
  ggplot(aes(y = country)) +
  geom_segment(
    aes(yend = country, x = donated, xend = received), 
    color = "grey40",
    data = segments
  ) +
  geom_point(aes(x = proportion_of_aid, color = direction), size = 2) +
  scale_y_discrete(expand = expand_scale(0)) +
  scale_x_sqrt(
    labels = scales::percent, 
    expand = expand_scale(0), 
    limits = c(0,0.4),
    breaks = c(0, 0.01, 0.025, 0.05, 0.1, 0.2, 0.3, 0.4)
  ) +
  scale_color_brewer(type = "qual", palette = "Set1", labels = str_to_title) +
  theme_minimal() +
  theme(legend.position = "top") +
  coord_cartesian(clip = "off") +
  labs(
    y = NULL,
    x = glue::glue("Percentage of Total Aid {min(pull(data, year))} - {max(pull(data, year))}"),
    color = NULL,
    title = "The United States and Japan are the world's major donors",
    subtitle = "India has received almost 40% of all global aid"
  )
## Warning: `expand_scale()` is deprecated; use `expansion()` instead.

## Warning: `expand_scale()` is deprecated; use `expansion()` instead.

Excuse the squished y-axis. I played around with it for a while and eventually gave up. Any hints are very welcome!

Has the Amount Donated Changed Over Time?

In keeping with the same metric, proportion of aid contributed and received, we can also look at the trends over time. I’ve highlighted the top three donors and recipients in the figure. Interestingly, it seems that receiving tends to be steadier over time, while donations see more anomalous spikes

donated <-
  data %>% 
  group_by(donor, year) %>% 
  summarise(donated = sum(amount)) %>% 
  ungroup() %>% 
  mutate(prop_donated = donated / sum(donated)) %>% 
  select(country = donor, year, prop_donated)
## `summarise()` regrouping output by 'donor' (override with `.groups` argument)
received <-
  data %>% 
  group_by(recipient, year) %>% 
  summarise(received = sum(amount)) %>% 
  ungroup() %>% 
  mutate(prop_received = received / sum(received)) %>% 
  select(country = recipient, year, prop_received)
## `summarise()` regrouping output by 'recipient' (override with `.groups` argument)
timeseries <-
  donated %>% 
  full_join(received, by = c("country", "year")) %>% 
  mutate_at(vars(prop_donated, prop_received), ~if_else(is.na(.), 0, .)) %>% 
  gather(prop_donated, prop_received, key = direction, value = proportion) %>% 
  mutate(
    proportion = if_else(direction == "prop_donated", proportion, -proportion)
  )
  
top_3 <- 
  aid %>% 
  filter(direction == "donated") %>% 
  top_n(3, proportion_of_aid) %>% 
  pull(country)

bottom_3 <-
  aid %>% 
  filter(direction == "received") %>% 
  top_n(3, proportion_of_aid) %>% 
  pull(country)

timeseries_main <-
  timeseries %>%
  filter(country %in% top_3 & direction == "prop_donated" | 
           country %in% bottom_3 & direction == "prop_received")

country_order <- 
  timeseries_main %>% 
  filter(year == max(pull(data, year))) %>% 
  arrange(desc(proportion)) %>% 
  pull(country)

timeseries_main <-
  timeseries_main %>% 
  mutate(country = factor(country, levels = country_order, ordered = TRUE))

labeller <-
  function(y) {
    y = if_else(y < 0, -y, y)
    scales::percent(y)
  }

timeseries %>% 
  unite(group, country, direction, remove = FALSE) %>% 
  ggplot(aes(year, proportion)) +
  geom_line(aes(group = group), alpha = 0.2) +
  geom_line(aes(color = country), data = timeseries_main) +
  scale_x_continuous(breaks = seq(1970, 2015, by = 5)) +
  scale_y_continuous(labels = labeller) +
  scale_color_brewer(type = "qual", palette = "Dark2") +
  theme_minimal() +
  labs(
    x = NULL,
    color = NULL,
    y = glue::glue("Percentage of Annual Aid"),
    title = "Contributions from significant donors is not constant over time",
    subtitle = "Major events like war and recession drive spikes in aid"
  ) +
  annotate(
    geom = "text",
    x = 1974,
    y = 0.0275,
    label = "Proportion Donated",
    hjust = 0
  ) +
  annotate(
    geom = "text",
    x = 1974,
    y = -0.0175,
    label = "Proportion Received",
    hjust = 0
  )

Related