Sunday, October 25, 2020

Covariance and Precision Matrices

My aim is to write an advanced maths book that uses just pictures :) This is my attempt for the concepts of covariance and precision.

Example 1: Student IQs

Let's take a class of 100 students and take 10 measurements for each of them. This measurement data is Gaussian in its distribution. Then, our data looks like this:

Data Heatmap

[Python code lives here].

We can see each student's measurements roughly conforms to a value but there is a large variety over all the students.

Now, we build the covariance matrix where:

  1. the expected value (mean) for a row is subtracted from each value in the vector
  2. the resulting vector is multiplied with all the other vectors and this forms the index for our matrix
  3. each cell is multiplied by the probability of that combination occuring. It's perfectly possible that this is zero in some cases but for our use case the distribution of which student and which measurement we might chose is uniform.

Note, regarding covariance as a measurement:

"Covariance is useful in some computations, but it is seldom reported as a summary statistic because it is hard to interpret. Among other problems, its units are the product of the units of X and Y. So the covariance of weight and height might be in units of kilogram-meters, which doesn’t mean much" - Think Stats

Anyway, our covariance matrix looks like this:

Covariance matrix heatmap

It should be immediately obvious that 
  1. each row has a strong covariance with itself (the diagonal)
  2. that the matrix is symmetric (since the covariance formula is symmetric) 
  3. and that all other cells than the diagonal are just noise.

The inverse of the covariance matrix is called the precision matrix. It looks like this:

Precision matrix heatmap

This represents the conditional probability of one student's measurements on another and not surprisingly, in our particular use case, the matrix is full of zeroes (plus some noise). In other words, the measurements for all students are independent. And why shouldn't they be? Given the measurements for student X, what does that have to do with the measurements for student Y? This might not be true for other datasets as we'll see in the next example. 

Example 2: A Stock Price

To illustrate this, I've written some Python code based on this excellent post from Prof Matthew Stephens. He asks us to imagine a random walk where on each step we add the value from a Gaussian distribution. The output might look like this:

Random walk


Note that just adding Gaussians qualifies as a Markov Chain as the next value only depends on the previous. That is:

X1 = N(σ2, μi)
X2 = X1 + N(σ2, μi)
X3 = X2 + N(σ2, μi
...
Xn = Xn-1 + N(σ2, μi)

We rewrite this so that our random variable, Z, is depicted as a vector of values generated from N(σ2, μi) and note that a random variable is neither random nor a variable in the programming sense of the words. It's values may have been drawn from a random distribution, but once we have them they are fixed as you can see from our Python code. 

Anyway, once we've done this, we can represent the above set of equations as matrices:

X = A Z

where A is just an n x n matrix where the lower left values are 1 and the others are 0 (try it).

The covariance matrix of A tells us how each of the elements are related to each other and looks like this:

Covariance Matrix of the Random Walk


Unsurprisingly, values are similar to their neighbours (the diaganol) and less similar to those further away (bottom left and top right corners). We'd expect that with a random walk.

[Just a note on covariance matrices: each row on n elements is centred on its mean. The resulting matrix is then multiplied by its transpose and divided by n - 1 (the -1 is the Bessel correction that comes from the fact we're sampling rather than using the entire population). But this depends on a certain interpretation of what a row is. If the row is taken from the same distribution, happy days.]

The precision matrix looks like this (zoomed into the top 10 x 10 sub matrix):

Precision Matrix of a Markov Chain

and this gives us an intuitive feeling for what the precision matrix is all about "The key property of the precision matrix is that its zeros tell you about conditional independence. Specifically [its values are] 0 if and only if Xi and Xj are conditionally independent given all other coordinates of X." [here] And this is what we see here. Each point is only dependent on its immediate neighbour - the very definition of a Markov process.

No comments:

Post a Comment