Agile Java Man: Givens Rotations

The problem

If A is a matrix whose columns are {v₁, v₂, v₃, ... v_n }, then any point in its span can be described with an appropriate choice of x thus:

A x =

.	.	.
v₁	v₂	v₃
.	.	.

x₁

x₂

x₃

where x₁, x₂ ... etc are the multipliers we want of a given dimension summed over all the vectors, v_i.

What this means is that any vector, b, can be expressed as A x. Equivalently, if we solve |b - A x| = 0 we're solving the least squares problem, beloved of loss functions in machine learning. "The advantage of an algorithm for least squares is that it can be used even when the matrix-vector equation has no solution" [1].

One algorithm: Gram Schmidt

The approach used in "Coding the Matrix" is the Gram Schmidt method. The circuitous route goes like this:

We want to find the closest point between a vector that we'll call b and a space defined by the span of vectors {v₁, v₂, v₃, ... v_n}. We will assume that v_i is orthogonal to v_j for all i ≠ j. If they're not orthogonal, it's not hard to turn them into vectors that are but that's another problem.
Given a k-dimensional vector space, V, the vector b can be expressed as the sum of two vectors - one that is in V and one that is orthogonal to V. In Klein's notation, this says b = b^||V + b^⊥V. "The point in V closest to b is b^||V and the distance is ||b^⊥V|| ... it suffices to find b^⊥V, for we can then obtain b^||V." [1].
Given the orthogonal set of vectors{v₁, v₂, v₃, ... v_n}, b^⊥V equals b with all the projections, <b, v_i>, shaved off. This creates a recursive algorithm that looks like:

b_i = b_i-1 - <b_i-1, v_i> / <v_i, v_i>

where b₀ = b. For convenience, we say σ_i = = b_i-1 - <b_i-1, v_i> / <v_i, v_i>
Now b can be expressed as

b = σ₀v₀ + ... + σ_k-1v_k-1 + b^⊥V

which just like the matrix at the top of this post can be expressed like:

b =

. ... . .

v₀ ... v_k-1 b^⊥V

. ... . .

σ₀

.

.

σ_k-1

1

Similarly, if {v₁, v₂, v₃, ... v_n} are not orthogonal, we can invoke the same recursive algorithm to generate the original vectors from a linear combination of mutually orthogonal vectors, {v₁*, v₂*, v₃*, ... v_n*}. It would look like this:

A =

.	.	...	.
v*₁	v*₂	...	v*_n
.	.	...	.

1	σ₁₂	...	...	...	σ_1n
0	1	σ₂₃	...	...	σ_2n
0	0	1	σ₃₄	...	σ_3n
.	.	.	.	.	.
.	.	.	.	1	σ_n-1
0	0	0	0	0	1

QR-factorization

QR-factorization is reducing an m x n matix A into two matrices, Q and R where Q is an m x n column orthogonal (ie, the columns are orthogonal and have a unit norm) matrix and R is a triangular matrix [1]. In other words, it looks like the matrix above where R is the upper triangular matrix.

Another algorithm: Givens Rotation

Now, there is another way of achieving the same end. We can rotate the matrix such that a particular cell becomes zero. Keep doing this and we get the upper triangular matrix as above. This is called a Givens rotation. Let's call each rotation G_i then:

R = G_k G_k-1... G₁ A

But note that each rotation is of the form:

1	0	...	...	...	.
0	1	...	...	...	.
.	.	.	.	.	.
.	.	c	.	-s	.
.	.	.	.	.	.
.	.	s	.	c	.
.	.	.	.	.	.
0	0	0	0	0	1

where c²+s²=1, the c values are on the diagonal and all the other diagonals are 1. Or, to put it more succinctly in Python:

def givensRotation(A, i, j):
G = np.eye(len(A))
aij = A[i, j]
ajj = A[j, j]
r = ((aij ** 2) + (ajj ** 2)) ** 0.5
c = ajj / r
s = - aij / r
G[j, i] = -s
G[j, j] = c
G[i, i] = c
G[i, j] = s
return G

Rearranging that equation for R, we get:

A = G₁^-1 G₂^-1 ... G_k-1^-1 G_k^-1 R

but as luck would have it, these G matrices are their own inverses (since c²+s²=1. Try it.). So:

A = G₁ G₂ ... G_k-1 G_k R

and all we need to do is keep the original G matrices in the right order and setting

Q = G₁ G₂ ... G_k-1 G_k

we get:

A = Q R

which is our QR decomposition. Booyah.

Given a triangle represented in Breeze as:

val m = DenseMatrix.zeros[Double](rows = 3, cols = 3)
m(0, ::) := DenseVector(1.0, 2.0, 3.0).t
m(1, ::) := DenseVector(-6.0, 5.0, -4.0).t
m(2, ::) := DenseVector(-7.0, 1.0, 9.0).t

we can transform it with Givens rotations to give us:

Red is original, blue is reflection

Distributed Givens Rotation using Spark

I tried this in my own toy linear algebra library for Spark, Algernon. Performance currently sucks but as a proof-of-concept it shows it can be done. I avoid Spark's matrix multiplication code for efficiency. Since the G matrix is mostly 1s in the diagonal and this is effectively a NoOp in matrix multiplication for that particular row, we can write:

val mathOps = implicitly[Numeric[T]]
import session.sqlContext.implicits._
import mathOps.mkNumericOps
ithRow.joinWith(jthRow, '_1 ("j") === '_2 ("j"), "outer").flatMap { case (x, y) =>

def calculate(x: MatrixCell[T], y: MatrixCell[T], _s: T) =
{if (x == null) mathOps.zero else (x.x * c)} + {if (y == null) mathOps.zero else (y.x * _s)}

val newIth = if (x == null) Seq() else Seq(MatrixCell(x.i, x.j, calculate(x, y, s)))
val newJth = if (y == null) Seq() else Seq(MatrixCell(y.i, y.j, calculate(x, y, -s)))

newIth ++ newJth
}

Here, we're only interested in the i-th and j-th rows, multiplying them with the c and s values we have already calculated.

[1] Coding the Matrix, Klein

Agile Java Man

Sunday, September 10, 2017

Givens Rotations

No comments:

Post a Comment

Blog Archive

About Me

1	0	...	...	...	.
0	1	...	...	...	.
.	.	.	.	.	.
.	.	c	.	-s	.
.	.	.	.	.	.
.	.	s	.	c	.
.	.	.	.	.	.
0	0	0	0	0	1

1	0	...	...	...	.
0	1	...	...	...	.
.	.	.	.	.	.
.	.	c	.	-s	.
.	.	.	.	.	.
.	.	s	.	c	.
.	.	.	.	.	.
0	0	0	0	0	1

1	0	...	...	...	.
0	1	...	...	...	.
.	.	.	.	.	.
.	.	c	.	-s	.
.	.	.	.	.	.
.	.	s	.	c	.
.	.	.	.	.	.
0	0	0	0	0	1