At the heart of Latent Semantic Analysis is a mathematical pattern called Singular Value Decomposition (SVD). This is an appropriate technique when you have sparse data in many dimensions. It reduces it to smaller dimensions with (hopefully) little loss of accuracy.

First, a little mathematical refresher.

__Eigenvalues and Eigenvectors__

An eigenvector is a vector that when multiplied by a matrix points in the same direction. It's length may have changed by (a non-zero) factor called the eigenvalue.

Only square matrices can have eigenvalues and eigenvectors. Anything non-square will change the dimension of the vector so the result would never be a multiple of itself. SVD is a hack to work with non-square matrices.

What's more, not all matrices will have real valued eigen-vectors/values. Let's say the matrix is a rotation. No vector (apart from the 0-vector) will point in the same direction after all rotations. In Python, this is a 45-degree rotation.

import numpy as np

def no_complex_elements(vec):

np.all(map(lambda arr: not np.iscomplex(arr.flatten), vec))

def print_eigenvectors_and_eigenvalues(matrix):

eigenvalues, eigenvectors = np.linalg.eig(rotation)

print "real eigenvalues \n", filter(lambda scalar: not isinstance(scalar, complex), eigenvalues)

print "real eigenvectors\n", filter(lambda vector: no_complex_elements(vector), eigenvectors)

if __name__ == "__main__":

rotation = np.matrix('0.707106781 -0.707106781 0;'

'0.707106781 0.707106781 0;'

'0 0 1')

print_eigenvectors_and_eigenvalues(rotation)

and it output's an empty list.

__The trolley-line-location problem__

In "Coding the Matrix", Philip Klein compares SVD to finding the optimal path for a straight tram line to run through a city. I'll skip most the proofs (they're in the book) but the gist goes like this:

Imagine the tram line must go through the city centre (the origin) while getting as close as possible to fixed land marks located with vectors

**a**.

_{i}Make each

**a**a row in a vector

_{i}**A**.

Now, each point

**a**can be expressed in terms of two vectors, one parallel to the tram line,

_{i}**a**, and one perpendicular to it,

_{i}^{||v}**a**(notice that we're choosing a new co-ordinate system) and that since

_{i}^{¬v}then, squaring:a=_{i}a+_{i}^{¬v}a_{i}^{||v}

||and re-arranging:a_{i}||^{2}= ||a_{i}^{¬v}||^{2}+ ||a_{i}^{||v }||^{2}

||a_{i}^{¬v}||^{2 }= ||a_{i}||^{2}- ||a_{i}^{||v }||^{2}

This, of course, is our distance from the tram line squared. If we want to minimize this value for all landmarks, we can express the problem as finding the minimum of:

∑ i |
(||
a_{i}||^{2} - ||a_{i}^{||v }||^{2}) |

||A||_{F}^{2 }- ||Av||^{2}

where the F indicates the Frobenius Norm (nothing too scary, just a generalization of a vector multiplied by itself) and where the second term is just us cleaning up all the other terms:

∑ i |
<
a_{i}, v>^{2} |

**a**

_{i},

**v**>is just the dot-product of

**a**

_{i}and the unit vector,

**v**which is of course

**a**

_{i}

^{||v}.

Seems a lot of work. Don't worry, it's all been high-school maths so far.

Anyway, it's this second term that we want to maximize. As the first term is fixed, maximizing the second will find the overall minimum for the whole equation. We won't worry too much how we do this - there are numerical techniques for doing it - this post is just to give the flavour.

Having chosen a suitable

__unit__vector to maximize the second term, we'll call it v

_{1}to remind us that this optimizing of distances between landmarks and the tram line can be viewed as a 1-dimensional approximation to covering all the landmarks as best we can using just a straight line. Perhaps we want to place a tram-stop at the point closest to each landmark. In which case, we could build a set of vectors similar to

**A**only this time each row is a vector to a tram stop. Let's call this matrix

**Ã**. Furthermore, we call ||A

**v**

_{1}|| the

*first singular value*(or σ

_{1}) and v

_{1}the

*first right singular vector*.

Then, the error in our approximation is

**A**-

**Ã**and we also know that this represents the distance from the tram line to landmark in each row in resultant matrix. Well, we've already calculated this, so:

||Note that the positions of these tram stops can also be calculated as the projection ofA-Ã||_{F}^{2 }= ||A||_{F}^{2 }- ||Av||^{2 }= ||A||_{F}^{2 }- σ_{1}^{2}

**a**

_{i}onto

**v**

_{1}. So, each row of

**Ã**can be expressed as:

<So, taking this equation and re-expressinga_{i,}v_{1}>v_{1}

where we define σÃ=Av_{1 }v_{1}^{T }= σ_{1 }u_{1 }v_{1}^{T}

_{1 }

**u**

_{1}=

**A**

**v**

_{1}

This is starting to look like our familiar equation in SVD. So, it only remains to note that this "trolley-line" example is a specific case. We could expand it to more than a 2-D map.

We do this by finding the unit vector

**v**

_{2}that is orthogonal to

**v**

_{1}and maximises ||A

**v**|| and again for

**v**

_{3},

**v**

_{4}, etc.

It doesn't take a great deal of effort to now show that σ

_{1}, σ

_{2}, σ

_{3}etc can be turned into a diagonal matrix and that would give us the SVD equation.

It's also not too hard to see that the values of this diagonal matrix are in descending order since at each stage we found the maximum before moving on to the next step.

## No comments:

## Post a Comment