Skip to main content

SUMMARISE or AGGREGATE combines across rows

Maybe we don't want data for every country but a summary of all countries put together? For example, what's the minimum, maximum, or average life expectancy across all countries and years?

This is a job for... summarise:

(
reformatted
%>% summarise(
min_life_expectancy = min( life_expectancy ),
average_life_expectancy = mean( life_expectancy ),
max_life_expectancy = max( life_expectancy ),
number_of_rows = n()
)
)

Hey! It doesn't work!

Question

Any idea why not?

Oh we have some rows with life_expectancy = NA. No problem, let's get rid of them:

(
reformatted
%>% filter( !is.na( life_expectancy ))
%>% summarise(
min_life_expectancy = min( life_expectancy ),
average_life_expectancy = mean( life_expectancy ),
max_life_expectancy = max( life_expectancy ),
n()
)
)

Hmm... that range of life expectancy is huge.

Question

Use filter() to find the rows with those minimum and maximum life expectancies. What countries / years are these?

Note on n()

In the above, we used n() to give a count of the the rows in each group. This is actually a special function defined in dplyr. You can see other useful ones [in the summarise() documentation]](https://dplyr.tidyverse.org/reference/summarise.html#useful-functions). (It's also possible to write your own.)

You don't need to know all these, but n() is particularly useful to remember.