Using Bayes' theorem to investigate a COVID test
The content of this page is optional for CM4 - we won't be examining on Bayes theorem.
However, it's useful and turns up in other contexts (e.g. CM2) so may be worth knowing about.
Bayes' theorem tells us how to turn a model for some data into what we should believe about the underlying quantities of interest. That is, it lets us do scientific inference.
Suppose is a thing we care about - maybe it takes a particular value - and we have collected some data. Bayes' theorem says:
The first term on the right is the model for our data, if we knew the value of , and the the second term is what we knew about before we saw the data.
We won't use Bayes' formula much in CM4, but it as it turns up in several places and gives the key paradigm for making inferences about things of interest from data - it's worth us doing an example here.
Example
To make this clear let's apply Bayes formula to this question:
Suppose you tested for COVID using a lateral flow test. How get a positive test result. How convinced should you be that you are infected?
In this context, is the outcome of interest - "am I infected?" - and the data is 'a +ve test result'. So Bayes' formula can be applied like this:
Putting in test accuracy numbers
Luckily there are huge datasets in which to estimate the quantities! For example this review of COVID-19 lateral flow test accuracy gives the following numbers:
- It says the "sensitivity" of test results, i.e. as 72% for symptomatic patients and 58% for asymptomatic.
- And the "specificity" i.e. as between 99.6% and 99.9%.
We can use these to work out the calculation.
There are also other estimates, such as this study in Welsh care homes, or this study. They suggest higher specificity closer to 99.9%.
Putting in infection rates
To apply Bayes, we also need to know likely we though were to be infected before we took the test.
Population COVID rates aren't still being reported so let's use numbers from 2023.
For example for Oxfordshire, UK the numbers in September 2023 were around a maximum of:
- 21.8 cases per 100,000 people in under-59s
- or 57.5 cases per 100,000 in the over-60s
Given the lack of systematic testing at the time, you could reasonably assume these must be underestimated. To be conservative, you might want to err on the side of caution - so let's imagine the true rate to be substantially larger than this - say at least 100 in 100,000, i.e.
Can you use Bayes' theorem to answer the question?
The denominator is easier to deal with than it seems.
Remember: it is just there to normalise the distribution, so it sums to one. So to compute the denominator we just need to sum up the numerator over all the possible values of . In our example this means 'Z = I am infected' and 'Z = I'm not infected'. So the denominator is:
Now we can just fill in the numbers to compute the result:
Can you use these numbers to compute the probability you are infected?
How does this change with (for example) the specifitiy, or the prior rate of infection?
Bayes terminology
The terms in