The problem
If A is a matrix whose columns are {v_{1}, v_{2}, v_{3}, ... v_{n} }, then any point in its span can be described with an appropriate choice of x thus:
A x = 


What this means is that any vector, b, can be expressed as A x. Equivalently, if we solve b  A x = 0 we're solving the least squares problem, beloved of loss functions in machine learning. "The advantage of an algorithm for least squares is that it can be used even when the matrixvector equation has no solution" [1].
One algorithm: Gram Schmidt
The approach used in "Coding the Matrix" is the Gram Schmidt method. The circuitous route goes like this:
 We want to find the closest point between a vector that we'll call b and a space defined by the span of vectors {v_{1}, v_{2}, v_{3}, ... v_{n}}. We will assume that v_{i} is orthogonal to v_{j} for all i ≠ j. If they're not orthogonal, it's not hard to turn them into vectors that are but that's another problem.
 Given a kdimensional vector space, V, the vector b can be expressed as the sum of two vectors  one that is in V and one that is orthogonal to V. In Klein's notation, this says b = b^{V} + b^{⊥V}. "The point in V closest to b is b^{V} and the distance is b^{⊥V} ... it suffices to find b^{⊥V}, for we can then obtain b^{V}." [1].
 Given the orthogonal set of vectors{v_{1}, v_{2}, v_{3}, ... v_{n}}, b^{⊥V} equals b with all the projections, <b, v_{i}>, shaved off. This creates a recursive algorithm that looks like:
b_{i} = b_{i1}  <b_{i1}, v_{i}> / <v_{i}, v_{i}>
where b_{0} = b. For convenience, we say σ_{i} = = b_{i1}  <b_{i1}, v_{i}> / <v_{i}, v_{i}>  Now b can be expressed as
b = σ_{0}v_{0} + ... + σ_{k1}v_{k1} + b^{⊥V}
which just like the matrix at the top of this post can be expressed like:b = . ... . . v_{0} ... v_{k1} b^{⊥V} . ... . . σ_{0} . . σ_{k1} 1
Similarly, if {v_{1}, v_{2}, v_{3}, ... v_{n}} are not orthogonal, we can invoke the same recursive algorithm to generate the original vectors from a linear combination of mutually orthogonal vectors, {v_{1}*, v_{2}*, v_{3}*, ... v_{n}*}. It would look like this:
QRfactorization
QRfactorization is reducing an m x n matix A into two matrices, Q and R where Q is an m x n column orthogonal (ie, the columns are orthogonal and have a unit norm) matrix and R is a triangular matrix [1]. In other words, it looks like the matrix above where R is the upper triangular matrix.
A = 


QRfactorization
QRfactorization is reducing an m x n matix A into two matrices, Q and R where Q is an m x n column orthogonal (ie, the columns are orthogonal and have a unit norm) matrix and R is a triangular matrix [1]. In other words, it looks like the matrix above where R is the upper triangular matrix.
Now, there is another way of achieving the same end. We can rotate the matrix such that a particular cell becomes zero. Keep doing this and we get the upper triangular matrix as above. This is called a Givens rotation. Let's call each rotation G_{i} then:
R = G_{k} G_{k1}... G_{1} A
But note that each rotation is of the form:
1  0  ...  ...  ...  . 
0  1  ...  ...  ...  . 
.  .  .  .  .  . 
.  .  c  .  s  . 
.  .  .  .  .  . 
.  .  s  .  c  . 
.  .  .  .  .  . 
0  0  0  0  0  1 
where c^{2}+s^{2}=1, the c values are on the diagonal and all the other diagonals are 1. Or, to put it more succinctly in Python:
def givensRotation(A, i, j):
G = np.eye(len(A))
aij = A[i, j]
ajj = A[j, j]
r = ((aij ** 2) + (ajj ** 2)) ** 0.5
c = ajj / r
s =  aij / r
G[j, i] = s
G[j, j] = c
G[i, i] = c
G[i, j] = s
return G
A = G_{1}^{1} G_{2}^{1} ... G_{k1}^{1} G_{k}^{1} R
but as luck would have it, these G matrices are their own inverses (since c^{2}+s^{2}=1. Try it.). So:
A = G_{1} G_{2} ... G_{k1} G_{k} R
and all we need to do is keep the original G matrices in the right order and setting
Q = G_{1} G_{2} ... G_{k1} G_{k}
we get:
A = Q R
which is our QR decomposition. Booyah.
Given a triangle represented in Breeze as:
m(0, ::) := DenseVector(1.0, 2.0, 3.0).t
m(1, ::) := DenseVector(6.0, 5.0, 4.0).t
m(2, ::) := DenseVector(7.0, 1.0, 9.0).t
we can transform it with Givens rotations to give us:
Red is original, blue is reflection 
Distributed Givens Rotation using Spark
I tried this in my own toy linear algebra library for Spark, Algernon. Performance currently sucks but as a proofofconcept it shows it can be done. I avoid Spark's matrix multiplication code for efficiency. Since the G matrix is mostly 1s in the diagonal and this is effectively a NoOp in matrix multiplication for that particular row, we can write:
val mathOps = implicitly[Numeric[T]]
import session.sqlContext.implicits._
import mathOps.mkNumericOps
ithRow.joinWith(jthRow, '_1 ("j") === '_2 ("j"), "outer").flatMap { case (x, y) =>
def calculate(x: MatrixCell[T], y: MatrixCell[T], _s: T) =
{if (x == null) mathOps.zero else (x.x * c)} + {if (y == null) mathOps.zero else (y.x * _s)}
val newIth = if (x == null) Seq() else Seq(MatrixCell(x.i, x.j, calculate(x, y, s)))
val newJth = if (y == null) Seq() else Seq(MatrixCell(y.i, y.j, calculate(x, y, s)))
newIth ++ newJth
}
Here, we're only interested in the ith and jth rows, multiplying them with the c and s values we have already calculated.
I tried this in my own toy linear algebra library for Spark, Algernon. Performance currently sucks but as a proofofconcept it shows it can be done. I avoid Spark's matrix multiplication code for efficiency. Since the G matrix is mostly 1s in the diagonal and this is effectively a NoOp in matrix multiplication for that particular row, we can write:
val mathOps = implicitly[Numeric[T]]
import session.sqlContext.implicits._
import mathOps.mkNumericOps
ithRow.joinWith(jthRow, '_1 ("j") === '_2 ("j"), "outer").flatMap { case (x, y) =>
def calculate(x: MatrixCell[T], y: MatrixCell[T], _s: T) =
{if (x == null) mathOps.zero else (x.x * c)} + {if (y == null) mathOps.zero else (y.x * _s)}
val newIth = if (x == null) Seq() else Seq(MatrixCell(x.i, x.j, calculate(x, y, s)))
val newJth = if (y == null) Seq() else Seq(MatrixCell(y.i, y.j, calculate(x, y, s)))
newIth ++ newJth
}
[1] Coding the Matrix, Klein
No comments:
Post a Comment