Computing with (co)variance

In the confounder simulation, we simulated a variable $B$ with variance $1$ , and then made $A$ another variable depending on $B$ like this:

A = \tfrac{1}{2} B + noise

or in R:

A = 0.5 * B + rnorm( N, mean = 0, sd = something )

It turns out that, if $B$ has variance $1$ , and we want $A$ to have variance $1$ , then adding something with variance exactly $0.75$ is the right thing here. Why is that?

The calculation of this is not very hard - it depends on the properties of the variance. This page explains it.

Properties of the variance

It turns out that the variance has two key properties which make this type of calculation easy. The properties are:

Variance property 1: The variance of a multiple of any variable $X$ scales like the multiple squared:

\begin{align} \text{variance}( a \times X ) = a^2 \times \text{var}(X) \end{align}

and

Variance property 2: The variance of two independent things added, $\text{var}(X + Y)$ , is just the sum of their variances

\begin{align} \text{var}(X+Y) = \text{var}(X) + \text{var}(Y) \end{align}

These rules are just what we need to do the calculation above, since the first lets us figure out how much variance the contribution of $\tfrac{1}{2} B$ contributes, and the second lets us work out how much more we need to add.

Challenge

Your solution
Hint 1
Hint 2

Use these two properties to work out on a piece of paper, how much the variance of the noise should be to make $A$ have variance $1$ again. (Or use the tabs to see some hints.)

Properties of the covariance

That square-the-variable behaviour always seems a bit complicated to me. I actually find these rules easiest to remember in this way:

The covariance $\text{cov}(X,Y)$ between two variables is a linear function (behaves like a straight-line function!) of each of its variables.

In maths speak we say it is 'bilinear'. It is linear in the first term:

\text{cov}(aX,Y) = a\times\text{cov}(X,Y) \qquad \text{cov}(X+Y,Z) = \text{cov}(X,Z) + \text{cov}(Y,Z)

and it's also linear in the second term:

\text{cov}(X,aY) = a\times\text{cov}(X,Y) \qquad \text{cov}(X,Y+Z) = \text{cov}(X,Y) + \text{cov}(X,Z)

It's also symmetric:

\text{cov}(X,Y) = \text{cov}(Y,X)

Covariance is a measure of the co-linearity of two variables (around their mean). It gets bigger the larger the variables are, and bigger the more they tend to take the same values (after subtracting their mean). What's more, the variance of a variable $X$ is just the covariance of $X$ with itself:

\text{var}(X) = \text{cov}(X,X)

The rules above boil down to applying the bi-linearity property to the variance, as in:

\text{var}(aX) = \text{cov}(aX,aX) = a^2 \times \text{var}(X)

which is the first property, and

\begin{align} \text{var}(X+Y) = \text{cov}(X+Y,X+Y) = \text{cov}(X,X) + \text{cov}(Y,Y) + 2\times \text{cov}(X,Y) \end{align}

which is a more general form of the second property. (If $X$ and $Y$ are independent, their covariance is zero, so the last term vanishes and this is the same as the one above in that case.)

Example

The last formula lets us work out more complex scenarios. For example, we simulated a third variable $C$ as

C = \tfrac{1}{2}A + \tfrac{1}{2} B + \text{noise}

...and we again wanted $C$ to have variance $1$ . How much variance do we need? The calculation is easy using formula (3):

\begin{align*} \text{var}(\tfrac{1}{2}A + \tfrac{1}{2} B) &= \text{cov}(\tfrac{1}{2}A + \tfrac{1}{2} B, \tfrac{1}{2}A + \tfrac{1}{2} B) \\ &= \text{cov}(\tfrac{1}{2}A, \tfrac{1}{2}A) + \text{cov}(\tfrac{1}{2}B, \tfrac{1}{2}B)+ 2\times \text{cov}(\tfrac{1}{2}A, \tfrac{1}{2}B) \\ & = \tfrac{1}{4}\text{cov}(A,A) + \tfrac{1}{4}\text{cov}(B,B) + \tfrac{1}{2}\text{cov}(A,B) \\ & = \tfrac{1}{4}\text{var}(A) + \tfrac{1}{4}\text{var}(B) + \tfrac{1}{2}\text{cov}(A,B) \end{align*}

In our computation $A$ and $B$ had variance $1$ , while we had

\text{cov}(A,B) = \tfrac{1}{2}

because of how $A$ was simulated. So this boils down to

\text{var}(\tfrac{1}{2}A + \tfrac{1}{2} B) = \tfrac{1}{4} + \tfrac{1}{4} + \tfrac{1}{4} = \frac{3}{4}

In other words, we need to add noise with variance $\tfrac{1}{4}$ to make $C$ have variance $1$ .

Computing with (co)variance

Properties of the variance​

Properties of the covariance​

Properties of the variance

Properties of the covariance