Part 1: Changing the mode of inheritance

From the earlier session it certainly looked like O blood group is associated with severe malaria outcomes.

But how do we know it was really an effect of (the recessive trait) O blood group? Could the variant have acted additively?

To find out let's load the data again and encode the status variable as before:

library( tidyverse )
data = (
    read_tsv( "https://www.chg.ox.ac.uk/bioinformatics/training/msc_gm/2024/data/o_bld_group_data.tsv" )
    %>% mutate(
        status = factor( status, levels = c( "CONTROL", "CASE" ))
    )
)

Challenge 1

Encode three new variables o_add, o_dom, o_rec to represent additive, dominant and recessive encodings of the O blood group variant rs8176719. (Note. the deletion allele is associated with O blood group, so orient things that way.)

Hints and solutions
Hint 1
Hint 2
Solution

Try do do this yourself! If you get stuck, look at the tabs above for hints.

The first thing to do (of course) is look at the values in the data:

table( data$rs8176719 )

For example you could encode an additive version like this:

data$o_add = c( 'C/C' = 0, '-/C' = 1, '-/-' = 2 )[ data$rs8176719 ]

or data verse-style

data = (
    data
    %>% mutate(
        o_add = c( 'C/C' = 0, '-/C' = 1, '-/-' = 2 )[ rs8176719 ]
    )
)

You could encode them all at once using one `mutate()`, like this:

data = (
    data
    %>% mutate(
        o_add = c( 'C/C' = 0, '-/C' = 1, '-/-' = 2 )[ rs8176719 ],
        o_dom = c( 'C/C' = 0, '-/C' = 1, '-/-' = 1 )[ rs8176719 ],
        o_rec = c( 'C/C' = 0, '-/C' = 0, '-/-' = 1 )[ rs8176719 ]
    )
)

Note. after doing this type of encoding, you should always compare youre the new values against the original ones to check you have got it right! E.g.

table( data$rs8176719, data$o_add )
# etc.

Challenge 2

Use logistic regression to test for association between each of your encodings of the rs8176719 genotype above and malaria case status (in the status variable.) What do you conclude?

Hints and solutions
Hint 1
Hint 2
Another suggestion

Can you remember how to do this?.

Something like this:

fit1 = glm(
    status ~ o_add,
    data = data %>% filter( country == 'Kenya' ),
    family = "binomial"
)
summary(fit1)$coefficients

You could also include two variables together:

fit1 = glm(
    status ~ o_add + o_dom,
    data = data %>% filter( country == 'Kenya' ),
    family = "binomial"
)
summary(fit1)$coefficients

which one 'wins out'?