Skip to main content

ARRANGE or SORT arranges rows by column values

All very well but... which countries have the highest life expectancy? Or the lowest?

If you completed the question in the group by part you'll have computed a life expectancy value per country, something like:

X = (
reformatted
%>% filter( !is.na( life_expectancy ))
%>% group_by( country )
%>% summarise(
min_life_expectancy = min( life_expectancy ),
average_life_expectancy = mean( life_expectancy ),
max_life_expectancy = max( life_expectancy ),
number_of_rows = n()
)
)

To find out the highest or lowest, all we have to do is arrange it (that is, sort or order it) by life expectancy:

X %>% arrange( average_life_expectancy )

You should see the rows with the lowest life expectancy at the top of the table.

What about the highest life expectancy? We want to do the same sort but in descending order, which in dplyr is done using desc(column_name):

X %>% arrange( desc( average_life_expectancy ))
Note

Maybe we actually just wanted to plot the gdp vs the life expectancy - should be easy now, right? At least if we restrict to our chosen countries:

(
X
%>% filter( country %in% countries )
%>% ggplot(
mapping = aes(
x = average_gdp_per_capita,
y = average_life_expectancy,
colour = country
)
) + geom_point( size = 4 )
+ xlab( "GDP per capita (average over years)" )
+ ylab( "Life expectancy (average over years)" )
+ theme_minimal(24)
)

However... hmm, wouldn't it be better if the colour legend came in the order of highest to lowest?

In ggplot, the trick to doing this is to transform the column values into something in the right order.
For a column with strings like this, this means transforming into a 'factor' variable. (A factor is a variable that takes one of a number of values - it is associated with an ordered list of values.)

So first, let's use an arrange to get the countries in the right order:

ordered_countries = (
X
%>% filter( country %in% countries )
%>% arrange( desc(average_life_expectancy) )
)$country

And then let's re-plot the plot using a mutate to add a correctly ordered column on the fly:

(
X
%>% filter( country %in% countries )
%>% mutate(
ordered_country = factor( country, levels = ordered_countries ) # <-- add ordered_country variable here
)
%>% ggplot(
mapping = aes(
x = average_gdp_per_capita,
y = average_life_expectancy,
colour = ordered_country # <-- use ordered_country variable here
)
) + geom_point( size = 4 )
+ xlab( "GDP per capita (average over years)" )
+ ylab( "Life expectancy (average over years)" )
+ theme_minimal(24)
)