Part 2: Testing for association A/B blood type
From the earlier data it certainly looked like O blood group is associated with severe malaria outcomes. It might have been an additive or a recessive effect - not quite clear.
But what about A/B blood group?
Including the variants as seperate predictors
The second variant in this data, rs8176746, encodes A/B blood type, with the 'T' allele encoding 'B' blood type and the 'G' allele (which the reference sequence carries) encoding 'A' blood type. For the most part, the O blood group mutation only occurs an A blood type background.
Satisfy yourself of this by tabling the two variants in the data.
To test for association with A/B status, we could of course just put the A/B SNP in the model instead. For a simple approach I suggest encode as an additive model for simplicity:
data = (
data
%>% mutate(
rs8176746_add = c( 'G/G' = 0, 'G/T' = 1, 'T/T' = 2 )[rs8176746 ]
)
)
Use logistic regression to test for association between A/B blood type and severe malaria status. Are they associated? What is the estimated relative risk and confidence interval?
Note. As before, you should work in one country (such as Kenya) or at the very least, include
country as a covariate to account for sampling differences between countries.
What happens if you also include the O blood group predictor, o_rec or o_add in the model?
What is your conclusion about whether the A/B might cause protection / risk of malaria?
Extra challenge
For extra kudos, find a way to encode A/B/O blood group directly. Approximately this is done using the following table which lists genotypes for rs8176719 and rs8176746:
combined genotype blood group phenotype
----------------- ---------------------
C/C G/G A
C/C G/T AB
C/C T/T B
-/C G/G A
-/C G/T B or A? (*)
-/C T/T B
-/- G/G O
-/- G/T O
-/- T/T O
Using this encoding, you could get an estimate for the association of each blood type (AB, B, or O) against the baseline type .
Note. There is a worked solution in case you get stuck.