Skip to main content

Using Bayes' theorem to investigate a COVID test

Note

The content of this page is optional for CM4 - we won't be examining on Bayes theorem.

However, it's useful and turns up in other contexts (e.g. CM2) so may be worth knowing about.

Bayes' theorem tells us how to turn a model for some data into what we should believe about the underlying quantities of interest. That is, it lets us do scientific inference.

Suppose ZZ is a thing we care about - maybe it takes a particular value zz - and we have collected some data. Bayes' theorem says:

P(Z=zdata)=P(dataZ=z)P(Z=z)normalising constantP(Z=z|\text{data}) = \frac{P(\text{data}|Z=z)\cdot P(Z=z)}{\text{normalising constant}}

The first term on the right is the model for our data, if we knew the value of ZZ, and the the second term is what we knew about ZZ before we saw the data.

We won't use Bayes' formula much in CM4, but it as it turns up in several places and gives the key paradigm for making inferences about things of interest from data - it's worth us doing an example here.

Example

To make this clear let's apply Bayes formula to this question:

Problem

Suppose you tested for COVID using a lateral flow test. How get a positive test result. How convinced should you be that you are infected?

In this context, ZZ is the outcome of interest - "am I infected?" - and the data is 'a +ve test result'. So Bayes' formula can be applied like this:

P(infected+ve test)=P(+ve testinfected)prior(infected)denominatorP\left(\text{infected}|{\text{+ve test}}\right) = \frac{P\left(\text{+ve test}|\text{infected}\right) \cdot \text{prior}(\text{infected})}{\text{denominator}}

Putting in test accuracy numbers

Luckily there are huge datasets in which to estimate the quantities! For example this review of COVID-19 lateral flow test accuracy gives the following numbers:

  • It says the "sensitivity" of test results, i.e. P(+ve testinfected)P(\text{+ve test}|\text{infected}) as 72% for symptomatic patients and 58% for asymptomatic.
  • And the "specificity" i.e. P(-ve testnot infected)P(\text{-ve test}|\text{not infected}) as between 99.6% and 99.9%.

We can use these to work out the calculation.

Note

There are also other estimates, such as this study in Welsh care homes, or this study. They suggest higher specificity closer to 99.9%.

Putting in infection rates

To apply Bayes, we also need to know likely we though were to be infected before we took the test.

Population COVID rates aren't still being reported so let's use numbers from 2023.

For example for Oxfordshire, UK the numbers in September 2023 were around a maximum of:

  • 21.8 cases per 100,000 people in under-59s
  • or 57.5 cases per 100,000 in the over-60s

Given the lack of systematic testing at the time, you could reasonably assume these must be underestimated. To be conservative, you might want to err on the side of caution - so let's imagine the true rate to be substantially larger than this - say at least 100 in 100,000, i.e.

P(infected)=11000P(\text{infected}) = \frac{1}{1000}

Can you use Bayes' theorem to answer the question?

Note on the denominator

The denominator is easier to deal with than it seems.

Remember: it is just there to normalise the distribution, so it sums to one. So to compute the denominator we just need to sum up the numerator over all the possible values of ZZ. In our example this means 'Z = I am infected' and 'Z = I'm not infected'. So the denominator is:

denominator=P(+veinfected)P(infected)+P(+venot infected)P(not infected)\text{denominator} = P\left( \text{+ve}|\text{infected}\right) \cdot P\left(\text{infected} \right) + P\left( \text{+ve}|\text{not infected}\right)\cdot P\left(\text{not infected} \right)

Now we can just fill in the numbers to compute the result:

P(+ve testinfected)=sensitivity=72%P\left(\text{+ve test}|\text{infected}\right) = \text{sensitivity} = 72\%
P(+ve testnot infected)=1specificity=0.4%P\left(\text{+ve test}|\text{not infected}\right) = 1 - \text{specificity} = 0.4\%
P(infected)=11000P\left(\text{infected}\right) = \frac{1}{1000}
P(not infected)=111000=9991000P\left(\text{not infected}\right) = 1 - \frac{1}{1000} = \frac{999}{1000}
Challenge

Can you use these numbers to compute the probability you are infected?

How does this change with (for example) the specifitiy, or the prior rate of infection?

Bayes terminology

The terms in