SUMMARISE or AGGREGATE combines across rows
Maybe we don't want data for every country but a summary of all countries put together? For example, what's the minimum, maximum, or average life expectancy across all countries and years?
This is a job for... summarise:
(
reformatted
%>% summarise(
min_life_expectancy = min( life_expectancy ),
average_life_expectancy = mean( life_expectancy ),
max_life_expectancy = max( life_expectancy ),
number_of_rows = n()
)
)
Hey! It doesn't work!
Any idea why not?
Oh we have some rows with life_expectancy = NA. No problem, let's get rid of them:
(
reformatted
%>% filter( !is.na( life_expectancy ))
%>% summarise(
min_life_expectancy = min( life_expectancy ),
average_life_expectancy = mean( life_expectancy ),
max_life_expectancy = max( life_expectancy ),
n()
)
)
Hmm... that range of life expectancy is huge.
Use filter() to find the rows with those minimum and maximum life expectancies. What countries / years are these?
In the above, we used n() to give a count of the the rows in each group.
This is actually a special function defined in dplyr.
You can see other useful ones [in the summarise() documentation]](https://dplyr.tidyverse.org/reference/summarise.html#useful-functions).
(It's also possible to write your own.)
You don't need to know all these, but n() is particularly useful to remember.