A/B example worked solution

Below is my worked solution to the A/B blood group qusetion - don't peek unless you are stuck!

A basic approach

For simplicity I'll work using the data from Kenya again (but it would of course be best to look across all populations as we did for O blood group.)

A basic approach is to test for association with rs8176746 directly.

kenya = data %>% filter( country == "Kenya" )
fit5 = glm(
    status ~ rs8176746_dosage + ethnicity + PC_1 + PC_2 + PC_3 + PC_4 + PC_5,
    data = kenya,
    family = "binomial"
)
summary(fit5)$coeff

                        Estimate Std. Error    z value     Pr(>|z|)
(Intercept)           -0.4800681 0.10339123 -4.6432183 3.430236e-06
rs8176746_dosage       0.2993734 0.07601033  3.9385885 8.196236e-05
(etc.)

Hey - rs8176746 does look associated as well! Maybe B blood type is associated with higher risk?

However, what happens if we put them both the variants in at once?

fit5b = glm(
    status ~ rs8176746_dosage + o_bld_group + ethnicity + PC_1 + PC_2 + PC_3 + PC_4 + PC_5,
    data = kenya,
    family = "binomial"  # needed to specify logistic regression
)
summary(fit5b)$coeff

> summary(fit5b)$coeff
                         Estimate Std. Error    z value     Pr(>|z|)
(Intercept)           -0.26703980 0.12389203 -2.1554236 0.0311287008
rs8176746_dosage       0.13082707 0.09106683  1.4366051 0.1508302354
o_bld_group           -0.35029673 0.09279261 -3.7750500 0.0001599756
(etc.)

Oh. Now rs8176746 doesn't really look associated.

Hmm... what does it look like across populations?

estimates = (
    data
    %>% group_by( country )
    %>% reframe(
        fit_regression(
            pick( everything() ),
            status ~ rs8176746_dosage + o_bld_group + ethnicity + PC_1 + PC_2 + PC_3 + PC_4 + PC_5
        )
    )
)
estimates %>% filter( parameter == 'rs8176746_dosage' )
estimates %>% filter( parameter == 'o_bld_group' )

What you will probably see is that, when including both variants together

O blood group still appears protective, consistently across populations
but the A/B blood group SNP appears inconsistently associated

So what's going on?

Well, these variants lie close to each other on the chromosome and if you table them both together you'll see that they are somewhat in linkage disequilirbium (correlated):

table( data$rs8176719, data$rs8176746 )

      G/G  G/T  T/T
 -/- 4621  264    7
 -/C 2331 2244   60
 C/C  331  510  287

(You can look them up in the UCSC genome browser to see exactly where they are.)

So this means the O blood group mutation is a potential confounder in the association between rs8176746 and malaria outcome. Specifically

If we believe O blood group is protective, i.e. reduces risk of malaria
and O blood group tends to be associated with rs8176746 'G' alleles

...then it's a confounder:

\text{severe malaria status} \leftarrow \text{O blood group status} \rightarrow \text{rs8176746 genotypes}

It's at least possible A/B has no strong effect with respect to malaria, but the association is just picking up an O blood group effect.

Encoding A/B/O directly

A more nuanced way to solve this problem is to encode the biologically relevant variable - A, AB, B or O blood type - directly. The biology works as follows: each individual has two chromosomes, and each carries the allele of rs8176746 which determines either the A or B antigen. Each chromosome might also carry a the loss-of-function 'O' deletion. Based on this we can call A/B/O blood type as follows:

combined genotype  blood group phenotype
-----------------  ---------------------
    C/C  G/G               A
    C/C  G/T               AB
    C/C  T/T               B
    -/C  G/G               A
    -/C  G/T            B or A? (*)
    -/C  T/T               B
    -/-  G/G               O
    -/-  G/T               O
    -/-  T/T               O

The cell marked (*) is the only difficult one here - both variants are heterozygous and we don't know from this data how they associated together on the chromosomes.

However, here we are helped by the fact that O blood type mutation almost always occurs on the 'A' type background (i.e. chromosomes with 'G' allele at rs8176746) which we saw in the table above. With a few exceptions, all type O individuals have G/G genotype at rs8176746; the heterozygous -/C individuals are also consistent with most O type haplotypes carrying the 'G' allele.

For the sake of this tutorial we will therefore assume that these doubly-heterozygous individuals have B blood type. Let's encode this now:

data$abo_type = factor( NA, levels = c( "A", "B", "AB", "O" ))
data$abo_type[ data$rs8176719 == 'C/C' & data$rs8176746 == 'G/G' ] = 'A'
data$abo_type[ data$rs8176719 == 'C/C' & data$rs8176746 == 'G/T' ] = 'AB'
data$abo_type[ data$rs8176719 == 'C/C' & data$rs8176746 == 'T/T' ] = 'B'
data$abo_type[ data$rs8176719 == '-/C' & data$rs8176746 == 'G/G' ] = 'A'
data$abo_type[ data$rs8176719 == '-/C' & data$rs8176746 == 'G/T' ] = 'B'
data$abo_type[ data$rs8176719 == '-/C' & data$rs8176746 == 'T/T' ] = 'B'
data$abo_type[ data$rs8176719 == '-/-' ] = 'O'

...and fit it:

kenya = data %>% filter( country == 'Kenya' )
fit6 = glm(
    status ~ abo_type + ethnicity + PC_1  + PC_2 + PC_3 + PC_4 + PC_5,
    data = kenya,
    family = "binomial"  # needed to specify logistic regression
)
summary(fit6)$coeff

The baseline is now, of course, 'A' blood type individuals.

Question

Is there any evidence that B, or AB blood type is associated with a different risk of malaria, compared to A?

Is there still evidence that O blood type is associated with lowered risk of malaria, compared to A?

A/B example worked solution

A basic approach​

Encoding A/B/O directly​

A basic approach

Encoding A/B/O directly