Tuesday, August 30, 2016

Refresher on Eigenvalues and Eigenvectors



A lot of graph analysis strongly relies on linear algebra so here's a quick refresher on stuff you'll have studied in your undergraduate days.

In the mathematics of graphs, you'll see the mention of eigenvalues and eigenvectors. The general description of them (as in that Wikipedia link) focuses on the rotation, translation and shearing of images. Although absolutely correct, it takes the emphasis away from their most interesting quality: we're solving sets of equations with them.

A set of equations that are formed of constants and variables that are never raised to a power greater than one are called linear equations.

Representation

Let's take a straight-forward set of simultaneous equations:

x-y+4z=5
2x-3y+ 8z=4
x-2y+4z=9

We can represent the coefficients as a matrix:

1-145
2-384
1-249

Now, we can produce a row reduced matrix by employing any combination of:

  • interchanging two rows
  • multiplying a row by a non-zero constant
  • adding one row to another

to give us:

10411
0-10-6
000-20

where the the entire matrix we'll call matrix A but the purple subset we'll call matrix M. The rank (that is, the number of non-zero rows after all row reduction) is greater in A than M indicating that the equations have no solution (since no values of 0x, 0y and 0z can equal the -20 in the last row). It is inconsistent.

The rule here is:
  1. rank M < rank A: no solution
  2. rank M = rank A = number of unknowns: one solution
  3. rank M = rank A = R < n: R unknowns can be expressed in terms of (n - R) unknowns.
An example:

x+y-z=7
2x-y- 5z=2
-5x+4y+14z=1
3x+-y+-7z=5

when row reduced becomes:

10-23
0114
0000
0000

Since this is option #3, and R = 2 and n  = 3, it's not surprising that we can express the original equations in one unknown (z, say)

x = 3 + 2z
y = 4 - z

which with a bit of Python, Matlib and some help from here like this

import matplotlib as mpl
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
import matplotlib.pyplot as plt

mpl.rcParams['legend.fontsize'] = 10
fig = plt.figure()
ax = fig.gca(projection='3d')
theta = np.linspace(-4 * np.pi, 4 * np.pi, 100)
z = np.linspace(-10, 10, 10)
x = 3 + (2 * z)
y = 4 - z
ax.plot(x, y, z, label='x = 3 + 2z; y = 4 - z')
ax.legend()

plt.show()

tells us that there are infinite solution - that is, anything on this line:


Determinants

Like the use of images to demonstrate the properties of matrices, determinants are often described geometrically in the way they define volumes (although there are other definitions). "For an n x n matrix with columns a1...an, the value of det A is the signed volume of the parallelepiped defined by the vectors a1...an." [2] This is similar to calculating the cross product. In fact, the triple product that uses both the cross product and the dot product is how to find the volume in 3 dimensions.

But, this is only part of the story. Determinants are interesting because of what they tell us about the matrix.

Calculating determinants is tedious. Thankfully, since my undergraduate days, there have been many libraries written to do it for me (floating-point caveats aside). Now, we just run something like:

import numpy as np

identity = np.eye(3)
print "det(I3) = ", np.linalg.det(identity)

which tells me the (not too surprising) fact that the determinant of the identity matrix (I) over field ℜ3 is 1.0.

More interestingly:

det AB = det BA = (det A) x (det B)

Why is this interesting? Well, if A is a matrix (and matrices can be treated as functions) that has an inverse:

A A-1 = I
det (A A-1) = det(A) x det(A-1) = det I = 1

If the product of terms equals 1, no term can be 0. So, for an inverse matrix to exist, a prerequisite is that the matrix's determinant must be non-zero.

Homogeneous Equations

When the constants on the right hand side of all linear equations in the set are 0, we call them homogeneous equations.

Unlike the equations at the top of this post, "homogeneous equations are never inconsistent; they always have the solution 'all unknowns = 0' (often called the 'trivial solution'). If the number of independent equations (that is, the rank of the matrix) is the same as the number of unknowns, this is the only solution." [1] That is, there are too many contradictory equations (see here for a nice demo).

"If the rank of the matrix is less than the number of unknowns, there are infinitely many solutions." [1] That is, we can only express unknowns in terms of other unknowns as there are no constants (try it!). However, the infinite solutions describe a line, a plane, a hyper-plane (depending on the number of dimensions in which we're working) and we can take unit-vectors over these spaces.

"A system of n homogeneous equations in n unknowns has solutions other than the trivial solution if and only if the determinant of the coefficients is zero" [1]

Eigen- vectors and values

Which finally brings us to eigen- vectors and values. What they are is simple. Again in geometric terms, we can consider them as vectors whose direction does not change under a transformation from a matrix but whose magnitude might. 

Why they are interesting is more subtle. They come in handy in problems as disparate as Latent Semantic Analysis where we want to know which documents contain which 'concepts' to determining if a graph is acyclic to producing an equation for the n-th element of the Fibinacci sequence (see Coding the Matrix, chapter 12) to finding the relationships between distributions in a multi-variate Gaussian... In short, the "why" is very domain specific.

Let's take their definition:

M v = λ v

(where M is a matrix, v an eigenvector and λ an eigenvalue - there may be more than one).

Not all matrices have eigen-vectors and -values (over the field on which the matrix acts - see here for more information). Again from a geometric point of view, if we imagine the matrix to represent a rotation, no vector stays pointing in the same direction (note: there may still be solutions but the eigenpairs will be made of complex numbers).

Not all eigenvectors are orthogonal. Generally, they are not although they are for symmetric matrices.

[1] Mathematical Method in the Physical Sciences - Boas.
[2] Coding the Matrix - Klein

Saturday, August 20, 2016

Louvain Modularity and Graph Modularity


Looking at ways to solve my problem of agglomerative clusters when trying to de-dupe data, I turned to community detection algorithms.

Louvain Modularity is a heuristic that tells you about communities in a network (implementations found here). The original paper ("Fast unfolding of communities in large networks") can be found here. The first equations comes without much of an introduction so I scuttled off to find where it came from. The natural choice was a brilliant book by the equation's creator Networks: An Introduction by Mark Newman*.

Newman points out the difficulty in coming up with an algorithm that finds communities. If you want to minimise the number of edges crossing a boundary between two groups, then a naive algorithm will cheerfully put no nodes in one group and all N nodes in the other. So, you might want to tell the algorithm to weight the results by multiplying them by the product of the numbers in each group. This is heading in the right direction as a 0-N split will be the least favourable. In fact, with a bit of simple calculus, you can find that the maximum is (N/2). This is nice but arbitrary.

So, one metric we could use is the difference between the actual number of edges connecting nodes of the same class with the expected number.

The actual number of between classes ci and cj is given by:
Σδ(ci,cj) =½ΣAij δ(ci,cj)
edges(i,j)ij

Where δ(ci,cj) is the Kronecker Delta (which is simply 1 if i == j else 0), A is our adjacency matrix and the ½ comes because we don't want to double-count all the edges in the adjacency matrix.

For the expected number, imagine node i that has degree ki. If the graph has m edges, there are obviously 2m ends. Therefore, for any outgoing edge from i, the chances that the other side is node j is kj/2m and for all outgoing edges from i, the chance is kikj/2m.

Add them all up, and the expected number is:

½Σ(ki kj / 2m) δ(ci,cj)
i,j

where, again, the ½ prevents us from double-counting.

Now, we take the difference and divide by m (since we interested in fractions not absolute numbers) and we get:

Q = (1/2m)Σ(Aij - ki kj / 2m) δ(ci,cj)
i,j

which is the equation in the paper.

* I really can't recommend this book enough. If you've got an undergraduate degree in something mathematically related, you'll find the equations no problem and they are presented in an easy-going manner.