## Sunday, September 10, 2017

### Givens Rotations

The problem

If A is a matrix whose columns are {v1v2v3, ... vn }, then any point in its span can be described with an appropriate choice of x thus:
x =
 . . . v1 v2 v3 . . .
 x1 x2 x3
where x1, x2 ... etc are the multipliers we want of a given dimension summed over all the vectors, vi.

What this means is that any vector, b, can be expressed as A x. Equivalently, if we solve |b - A x| = 0 we're solving the least squares problem, beloved of loss functions in machine learning. "The advantage of an algorithm for least squares is that it can be used even when the matrix-vector equation has no solution" .

One algorithm: Gram Schmidt

The approach used in "Coding the Matrix" is the Gram Schmidt method. The circuitous route goes like this:
1. We want to find the closest point between a vector that we'll call b and a space defined by the span of vectors {v1v2v3, ... vn}. We will assume that vi is orthogonal to vj for all i ≠ j. If they're not orthogonal, it's not hard to turn them into vectors that are but that's another problem.
2. Given a k-dimensional vector space, V, the vector b can be expressed as the sum of two vectors - one that is in V and one that is orthogonal to V. In Klein's notation, this says b = b||V + b⊥V. "The point in V closest to b is b||V and the distance is ||b⊥V|| ... it suffices to find b⊥V, for we can then obtain b||V." .
3. Given the orthogonal set of vectors{v1v2v3, ... vn}, b⊥V equals b with all the projections, <b, vi>, shaved off. This creates a recursive algorithm that looks like:

bi = bi-1 - <bi-1, vi> / <vi, vi>

where b0 = b. For convenience, we say σi = = bi-1 - <bi-1, vi> / <vi, vi>
4. Now b can be expressed as

b = σ0v0 + ... + σk-1vk-1 + b⊥V

which just like the matrix at the top of this post can be expressed like:
b =
 . ... . . v0 ... vk-1 b⊥V . ... . .
 σ0 . . σk-1 1
Similarly, if {v1v2v3, ... vn} are not orthogonal, we can invoke the same recursive algorithm to generate the original vectors from a linear combination of mutually orthogonal vectors, {v1*, v2*, v3*, ... vn*}. It would look like this:
A =
 . . ... . v*1 v*2 ... v*n . . ... .

 1 σ12 ... ... ... σ1n 0 1 σ23 ... ... σ2n 0 0 1 σ34 ... σ3n . . . . . . . . . . 1 σn-1 0 0 0 0 0 1

QR-factorization

QR-factorization is reducing an m x n matix A into two matrices, Q and R where Q is an m x n column orthogonal (ie, the columns are orthogonal and have a unit norm) matrix and R is a triangular matrix . In other words, it looks like the matrix above where R is the upper triangular matrix.

Another algorithm: Givens Rotation

Now, there is another way of achieving the same end. We can rotate the matrix such that a particular cell becomes zero. Keep doing this and we get the upper triangular matrix as above. This is called a Givens rotation. Let's call each rotation Gi then:

R = Gk Gk-1... G1  A

But note that each rotation is of the form:

 1 0 ... ... ... . 0 1 ... ... ... . . . . . . . . . c . -s . . . . . . . . . s . c . . . . . . . 0 0 0 0 0 1

where c2+s2=1, the c values are on the diagonal and all the other diagonals are 1. Or, to put it more succinctly in Python:

def givensRotation(A, i, j):
G = np.eye(len(A))
aij = A[i, j]
ajj = A[j, j]
r = ((aij ** 2) + (ajj ** 2)) ** 0.5
c = ajj / r
s = - aij / r
G[j, i] = -s
G[j, j] = c
G[i, i] = c
G[i, j] = s
return G

Rearranging that equation for R, we get:

A = G1-1 G2-1 ... Gk-1-1 Gk-1  R

but as luck would have it, these G matrices are their own inverses (since c2+s2=1. Try it.). So:

A = G1 G2 ... Gk-1 Gk  R

and all we need to do is keep the original G matrices in the right order and setting

Q = G1 G2 ... Gk-1 Gk

we get:

A = Q R

which is our QR decomposition. Booyah.

Given a triangle represented in Breeze as:

val m = DenseMatrix.zeros[Double](rows = 3, cols = 3)
m(0, ::) := DenseVector(1.0, 2.0, 3.0).t
m(1, ::) := DenseVector(-6.0, 5.0, -4.0).t
m(2, ::) := DenseVector(-7.0, 1.0, 9.0).t

we can transform it with Givens rotations to give us: Red is original, blue is reflection

Distributed Givens Rotation using Spark

I tried this in my own toy linear algebra library for Spark, Algernon. Performance currently sucks but as a proof-of-concept it shows it can be done. I avoid Spark's matrix multiplication code for efficiency. Since the G matrix is mostly 1s in the diagonal and this is effectively a NoOp in matrix multiplication for that particular row, we can write:

val mathOps = implicitly[Numeric[T]]
import session.sqlContext.implicits._
import mathOps.mkNumericOps
ithRow.joinWith(jthRow, '_1 ("j") === '_2 ("j"), "outer").flatMap { case (x, y) =>

def calculate(x: MatrixCell[T], y: MatrixCell[T], _s: T) =
{if (x == null) mathOps.zero else (x.x * c)} + {if (y == null) mathOps.zero else (y.x * _s)}

val newIth = if (x == null) Seq() else Seq(MatrixCell(x.i, x.j, calculate(x, y, s)))
val newJth = if (y == null) Seq() else Seq(MatrixCell(y.i, y.j, calculate(x, y, -s)))

newIth ++ newJth
}

Here, we're only interested in the i-th and j-th rows, multiplying them with the c and s values we have already calculated.

 Coding the Matrix, Klein