Skip to main content

Bonus verb: PIVOT or RESHAPE

Here is one last 'verb' which is useful.

A pivot or reshape operation can be used to transform 'long form' data into 'wide form' and vice versa.

For example, suppose we wanted to compare different years' worth of data in our data. We could do that with a join but, well, there are 16 years to do with so that would need 15 joins!

Instead let's do it all in one go with a pivot.

First let's get just the life expectancy data from our table:

life_expectancy = reformatted %>% select( country, year, life_expectancy )

Look at this data now. From the point of view of the year, it is in 'long' form, with values for different years on different rows. Let's reshape that to 'wide' form with years in columns. The pivot commands actually live inside the tidyr package rather than dplyr, so let's load that now:

library( tidyr )
wide_form = (
life_expectancy
%>% pivot_wider(
names_from = "year",
values_from = "life_expectancy"
)
)

Getting this pivot right takes a bit of experimentation, but the things to know are:

  • names_from tells it which columns to 'pivot' over - they will end up as the column names.
  • values_from tells it which column to get the values from.

The other columns stay unchanged - they'll end up on the left of the data frame. (Here it's just country)

So now we could, for example, find countries where the expectancy is lower in 2015 than 2000:

(
wide_form
%>% filter( `2015` < `2000` )
)
Note

I'm using those \backticks`` again so that R realises those are columns names, not numbers.

To get back from wide form, we need... (wait for it)... pivot_longer():

(
wide_form
%>% pivot_longer(
cols = colnames(wide_form)[-1]
)
)

...or we can help it know what to name the columns:

(
wide_form
%>% pivot_longer(
cols = colnames(wide_form)[-1],
names_to = "year",
values_to = "life_expectancy"
)
)

Voila! We are back to where we started!