The problem
If A is a matrix whose columns are {v1, v2, v3, ... vn }, then any point in its span can be described with an appropriate choice of x thus:
A x = |
|
|
What this means is that any vector, b, can be expressed as A x. Equivalently, if we solve |b - A x| = 0 we're solving the least squares problem, beloved of loss functions in machine learning. "The advantage of an algorithm for least squares is that it can be used even when the matrix-vector equation has no solution" [1].
One algorithm: Gram Schmidt
The approach used in "Coding the Matrix" is the Gram Schmidt method. The circuitous route goes like this:
- We want to find the closest point between a vector that we'll call b and a space defined by the span of vectors {v1, v2, v3, ... vn}. We will assume that vi is orthogonal to vj for all i ≠ j. If they're not orthogonal, it's not hard to turn them into vectors that are but that's another problem.
- Given a k-dimensional vector space, V, the vector b can be expressed as the sum of two vectors - one that is in V and one that is orthogonal to V. In Klein's notation, this says b = b||V + b⊥V. "The point in V closest to b is b||V and the distance is ||b⊥V|| ... it suffices to find b⊥V, for we can then obtain b||V." [1].
- Given the orthogonal set of vectors{v1, v2, v3, ... vn}, b⊥V equals b with all the projections, <b, vi>, shaved off. This creates a recursive algorithm that looks like:
bi = bi-1 - <bi-1, vi> / <vi, vi>
where b0 = b. For convenience, we say σi = = bi-1 - <bi-1, vi> / <vi, vi> - Now b can be expressed as
b = σ0v0 + ... + σk-1vk-1 + b⊥V
which just like the matrix at the top of this post can be expressed like:b = . ... . . v0 ... vk-1 b⊥V . ... . . σ0 . . σk-1 1
Similarly, if {v1, v2, v3, ... vn} are not orthogonal, we can invoke the same recursive algorithm to generate the original vectors from a linear combination of mutually orthogonal vectors, {v1*, v2*, v3*, ... vn*}. It would look like this:
QR-factorization
QR-factorization is reducing an m x n matix A into two matrices, Q and R where Q is an m x n column orthogonal (ie, the columns are orthogonal and have a unit norm) matrix and R is a triangular matrix [1]. In other words, it looks like the matrix above where R is the upper triangular matrix.
A = |
|
|
QR-factorization
QR-factorization is reducing an m x n matix A into two matrices, Q and R where Q is an m x n column orthogonal (ie, the columns are orthogonal and have a unit norm) matrix and R is a triangular matrix [1]. In other words, it looks like the matrix above where R is the upper triangular matrix.
Now, there is another way of achieving the same end. We can rotate the matrix such that a particular cell becomes zero. Keep doing this and we get the upper triangular matrix as above. This is called a Givens rotation. Let's call each rotation Gi then:
R = Gk Gk-1... G1 A
But note that each rotation is of the form:
1 | 0 | ... | ... | ... | . |
0 | 1 | ... | ... | ... | . |
. | . | . | . | . | . |
. | . | c | . | -s | . |
. | . | . | . | . | . |
. | . | s | . | c | . |
. | . | . | . | . | . |
0 | 0 | 0 | 0 | 0 | 1 |
where c2+s2=1, the c values are on the diagonal and all the other diagonals are 1. Or, to put it more succinctly in Python:
def givensRotation(A, i, j):
G = np.eye(len(A))
aij = A[i, j]
ajj = A[j, j]
r = ((aij ** 2) + (ajj ** 2)) ** 0.5
c = ajj / r
s = - aij / r
G[j, i] = -s
G[j, j] = c
G[i, i] = c
G[i, j] = s
return G
A = G1-1 G2-1 ... Gk-1-1 Gk-1 R
but as luck would have it, these G matrices are their own inverses (since c2+s2=1. Try it.). So:
A = G1 G2 ... Gk-1 Gk R
and all we need to do is keep the original G matrices in the right order and setting
Q = G1 G2 ... Gk-1 Gk
we get:
A = Q R
which is our QR decomposition. Booyah.
Given a triangle represented in Breeze as:
m(0, ::) := DenseVector(1.0, 2.0, 3.0).t
m(1, ::) := DenseVector(-6.0, 5.0, -4.0).t
m(2, ::) := DenseVector(-7.0, 1.0, 9.0).t
we can transform it with Givens rotations to give us:
Red is original, blue is reflection |
Distributed Givens Rotation using Spark
I tried this in my own toy linear algebra library for Spark, Algernon. Performance currently sucks but as a proof-of-concept it shows it can be done. I avoid Spark's matrix multiplication code for efficiency. Since the G matrix is mostly 1s in the diagonal and this is effectively a NoOp in matrix multiplication for that particular row, we can write:
val mathOps = implicitly[Numeric[T]]
import session.sqlContext.implicits._
import mathOps.mkNumericOps
ithRow.joinWith(jthRow, '_1 ("j") === '_2 ("j"), "outer").flatMap { case (x, y) =>
def calculate(x: MatrixCell[T], y: MatrixCell[T], _s: T) =
{if (x == null) mathOps.zero else (x.x * c)} + {if (y == null) mathOps.zero else (y.x * _s)}
val newIth = if (x == null) Seq() else Seq(MatrixCell(x.i, x.j, calculate(x, y, s)))
val newJth = if (y == null) Seq() else Seq(MatrixCell(y.i, y.j, calculate(x, y, -s)))
newIth ++ newJth
}
Here, we're only interested in the i-th and j-th rows, multiplying them with the c and s values we have already calculated.
I tried this in my own toy linear algebra library for Spark, Algernon. Performance currently sucks but as a proof-of-concept it shows it can be done. I avoid Spark's matrix multiplication code for efficiency. Since the G matrix is mostly 1s in the diagonal and this is effectively a NoOp in matrix multiplication for that particular row, we can write:
val mathOps = implicitly[Numeric[T]]
import session.sqlContext.implicits._
import mathOps.mkNumericOps
ithRow.joinWith(jthRow, '_1 ("j") === '_2 ("j"), "outer").flatMap { case (x, y) =>
def calculate(x: MatrixCell[T], y: MatrixCell[T], _s: T) =
{if (x == null) mathOps.zero else (x.x * c)} + {if (y == null) mathOps.zero else (y.x * _s)}
val newIth = if (x == null) Seq() else Seq(MatrixCell(x.i, x.j, calculate(x, y, s)))
val newJth = if (y == null) Seq() else Seq(MatrixCell(y.i, y.j, calculate(x, y, -s)))
newIth ++ newJth
}
[1] Coding the Matrix, Klein
No comments:
Post a Comment