Sunday, September 10, 2017

Givens Rotations

The problem

If A is a matrix whose columns are {v1v2v3, ... vn }, then any point in its span can be described with an appropriate choice of x thus:
x = 
where x1, x2 ... etc are the multipliers we want of a given dimension summed over all the vectors, vi.

What this means is that any vector, b, can be expressed as A x. Equivalently, if we solve |b - A x| = 0 we're solving the least squares problem, beloved of loss functions in machine learning. "The advantage of an algorithm for least squares is that it can be used even when the matrix-vector equation has no solution" [1].

One algorithm: Gram Schmidt

The approach used in "Coding the Matrix" is the Gram Schmidt method. The circuitous route goes like this:
  1. We want to find the closest point between a vector that we'll call b and a space defined by the span of vectors {v1v2v3, ... vn}. We will assume that vi is orthogonal to vj for all i ≠ j. If they're not orthogonal, it's not hard to turn them into vectors that are but that's another problem.
  2. Given a k-dimensional vector space, V, the vector b can be expressed as the sum of two vectors - one that is in V and one that is orthogonal to V. In Klein's notation, this says b = b||V + b⊥V. "The point in V closest to b is b||V and the distance is ||b⊥V|| ... it suffices to find b⊥V, for we can then obtain b||V." [1].
  3. Given the orthogonal set of vectors{v1v2v3, ... vn}, b⊥V equals b with all the projections, <b, vi>, shaved off. This creates a recursive algorithm that looks like:

    bi = bi-1 - <bi-1, vi> / <vi, vi>

    where b0 = b. For convenience, we say σi = = bi-1 - <bi-1, vi> / <vi, vi>
  4. Now b can be expressed as

    b = σ0v0 + ... + σk-1vk-1 + b⊥V

    which just like the matrix at the top of this post can be expressed like:
    b = 
Similarly, if {v1v2v3, ... vn} are not orthogonal, we can invoke the same recursive algorithm to generate the original vectors from a linear combination of mutually orthogonal vectors, {v1*, v2*, v3*, ... vn*}. It would look like this:
A = 



QR-factorization is reducing an m x n matix A into two matrices, Q and R where Q is an m x n column orthogonal (ie, the columns are orthogonal and have a unit norm) matrix and R is a triangular matrix [1]. In other words, it looks like the matrix above where R is the upper triangular matrix.

Another algorithm: Givens Rotation

Now, there is another way of achieving the same end. We can rotate the matrix such that a particular cell becomes zero. Keep doing this and we get the upper triangular matrix as above. This is called a Givens rotation. Let's call each rotation Gi then:

R = Gk Gk-1... G1  A

But note that each rotation is of the form:


where c2+s2=1, the c values are on the diagonal and all the other diagonals are 1. Or, to put it more succinctly in Python:

def givensRotation(A, i, j):
    G = np.eye(len(A))
    aij = A[i, j]
    ajj = A[j, j]
    r = ((aij ** 2) + (ajj ** 2)) ** 0.5
    c = ajj / r
    s = - aij / r
    G[j, i] = -s
    G[j, j] = c
    G[i, i] = c
    G[i, j] = s
    return G

Rearranging that equation for R, we get:

A = G1-1 G2-1 ... Gk-1-1 Gk-1  R

but as luck would have it, these G matrices are their own inverses (since c2+s2=1. Try it.). So:

A = G1 G2 ... Gk-1 Gk  R

and all we need to do is keep the original G matrices in the right order and setting

Q = G1 G2 ... Gk-1 Gk

we get:

A = Q R

which is our QR decomposition. Booyah.

Given a triangle represented in Breeze as:

    val m = DenseMatrix.zeros[Double](rows = 3, cols = 3)
    m(0, ::) := DenseVector(1.0, 2.0, 3.0).t
    m(1, ::) := DenseVector(-6.0, 5.0, -4.0).t
    m(2, ::) := DenseVector(-7.0, 1.0, 9.0).t

we can transform it with Givens rotations to give us:
Red is original, blue is reflection

Distributed Givens Rotation using Spark

I tried this in my own toy linear algebra library for Spark, Algernon. Performance currently sucks but as a proof-of-concept it shows it can be done. I avoid Spark's matrix multiplication code for efficiency. Since the G matrix is mostly 1s in the diagonal and this is effectively a NoOp in matrix multiplication for that particular row, we can write:

    val mathOps = implicitly[Numeric[T]]
    import session.sqlContext.implicits._
    import mathOps.mkNumericOps
    ithRow.joinWith(jthRow, '_1 ("j") === '_2 ("j"), "outer").flatMap { case (x, y) =>

      def calculate(x: MatrixCell[T], y: MatrixCell[T], _s: T) =
        {if (x == null) else (x.x * c)} + {if (y == null) else (y.x * _s)}

      val newIth = if (x == null) Seq() else Seq(MatrixCell(x.i, x.j, calculate(x, y, s)))
      val newJth = if (y == null) Seq() else Seq(MatrixCell(y.i, y.j, calculate(x, y, -s)))

      newIth ++ newJth

Here, we're only interested in the i-th and j-th rows, multiplying them with the c and s values we have already calculated.

[1] Coding the Matrix, Klein

No comments:

Post a Comment