Part 1: Changing the mode of inheritance
From the earlier session it certainly looked like O blood group is associated with severe malaria outcomes.
But how do we know it was really an effect of (the recessive trait) O blood group? Could the variant have acted additively?
To find out let's load the data again and encode the status variable as before:
library( tidyverse )
data = (
read_tsv( "https://www.chg.ox.ac.uk/bioinformatics/training/msc_gm/2024/data/o_bld_group_data.tsv" )
%>% mutate(
status = factor( status, levels = c( "CONTROL", "CASE" ))
)
)
Encode three new variables o_add, o_dom, o_rec to represent additive, dominant and recessive encodings of the O
blood group variant rs8176719. (Note. the deletion allele is associated with O blood group, so orient things that way.)
- Hints and solutions
- Hint 1
- Hint 2
- Solution
The first thing to do (of course) is look at the values in the data:
table( data$rs8176719 )
For example you could encode an additive version like this:
data$o_add = c( 'C/C' = 0, '-/C' = 1, '-/-' = 2 )[ data$rs8176719 ]
or data verse-style
data = (
data
%>% mutate(
o_add = c( 'C/C' = 0, '-/C' = 1, '-/-' = 2 )[ rs8176719 ]
)
)
data = (
data
%>% mutate(
o_add = c( 'C/C' = 0, '-/C' = 1, '-/-' = 2 )[ rs8176719 ],
o_dom = c( 'C/C' = 0, '-/C' = 1, '-/-' = 1 )[ rs8176719 ],
o_rec = c( 'C/C' = 0, '-/C' = 0, '-/-' = 1 )[ rs8176719 ]
)
)
Note. after doing this type of encoding, you should always compare youre the new values against the original ones to check you have got it right! E.g.
table( data$rs8176719, data$o_add )
# etc.
Use logistic regression to test for association between each of your encodings of the
rs8176719 genotype above and malaria case status (in the status variable.) What do you conclude?
- Hints and solutions
- Hint 1
- Hint 2
- Another suggestion
Can you remember how to do this?.
It's best to either restrict things to one country (like Kenya) or else include country as a covariate. This is because different numbers of cases and controls were collected in different countries (of course) so it makes sense to allow them to have different
Something like this:
fit1 = glm(
status ~ o_add,
data = data %>% filter( country == 'Kenya' ),
family = "binomial"
)
summary(fit1)$coefficients
You could also include two variables together:
fit1 = glm(
status ~ o_add + o_dom,
data = data %>% filter( country == 'Kenya' ),
family = "binomial"
)
summary(fit1)$coefficients
which one 'wins out'?