Nothing to do with Roald Dahl but a trick used to optimize machine learning algorithms (see Spark's mllib library). We're iteratively trying to find the lowest point in some space and representing this value with m

_{k}where k is the iteration step number.

We start with the quadratic model:

m

taken from a Taylor expansion where

At the minimum, ∇m

So, the next step can be given as:

where α

A positive definite matrix, M, satisfies x

f(

where obviously ∇ f(

See p16 of Numerical Optimization (Norcedel & Wright) for a more convincing argument.

We can treat the space just like an electromagnetic field that has an electric potential at every point in that space. So, although there is a

Lets' take an example (full source code available here):

import mayavi.mlab as plt

import numpy as np

import math as m

from sympy import *

x, y, z = symbols('x y z')

I'll flesh out the algorithm more in the next post.

m

_{k}= f(**x**_{k}) + ∇ f(**x**_{k})^{T}**p**_{k}+ (**p**_{k}^{T}B_{k}**p**_{k}) / 2taken from a Taylor expansion where

**p**_{k}is the step and B_{k}is a Hessian matrix that is positive definite (for reasons concerning convergence - see below).At the minimum, ∇m

_{k}will equal 0. Also, the last term will disappear (since our model is quadratic). Re-arranging gives:**p**_{k}= - B_{k}^{-1}∇ f(**x**_{k})So, the next step can be given as:

**x**_{k+1}=**x**_{k}+ α_{k}B_{k}^{-1}∇ f(**x**_{k})where α

_{k}is the step size at the k-th step.__Why positive definite?__A positive definite matrix, M, satisfies x

^{T}M x > 0 for all x. Note that if we do a Taylor expansion around the minimum point**x**^{*}, we getf(

**x**^{*}+**p**) = f(**x**^{*}) +**p**^{T}∇ f(**x**^{*}) +**p**^{T}∇^{2}f(**x**^{*})**p**/ 2 + ...where obviously ∇ f(

**x**^{*}) is zero and f(**x**^{*}) is less than f(**x**^{*}+**p**) by the definition of**x**^{*}. Therefore, the final term is greater than 0 and hence it is positive definite. QED.See p16 of Numerical Optimization (Norcedel & Wright) for a more convincing argument.

__Surfaces__We can treat the space just like an electromagnetic field that has an electric potential at every point in that space. So, although there is a

**scalar**associated with every point in that space,*how that scalar changes*can lead to a**vector**associated with every point in that space.Lets' take an example (full source code available here):

import mayavi.mlab as plt

import numpy as np

import math as m

from sympy import *

x, y, z = symbols('x y z')

**f = z ** 2 * cos(x) ** 2 + cos(y) ** 2**

what does this look like? Well, since it's a scalar value at every point in 3-D space, it's hard to visualize. We could view it as a heatmap at various values of z (that is, take a slice on the x-y plane for a given z value).

Slices of the space of f at given values of z |

Or we could fix f for an arbitrary value and look at the surface that is defined by all points that map to that value (an "equipotential" in physics terms). For instance:

Surfaces at f=0 (bottom), f=2 (middle) and f=4 (top) |

For simplicity, let's arbitrarily look at the surface formed by f=0 and look at the gradient - that is ∇f(

Because I'm a careless mathematician, it's a pleasure to find that Sympy does the work for me. I write:

div = [f.diff(v) for v in [x, y, z]]

**x**_{k}) - along the line x=0 that we chose arbitrarily.Because I'm a careless mathematician, it's a pleasure to find that Sympy does the work for me. I write:

div = [f.diff(v) for v in [x, y, z]]

then I can calculate the gradient vector at all points on this surface with:

def gradient(xpt, ypt, zpt):

gradx = div[0].subs(x, xpt).subs(y, ypt).subs(z, zpt).evalf()

grady = div[1].subs(x, xpt).subs(y, ypt).subs(z, zpt).evalf()

gradz = div[2].subs(x, xpt).subs(y, ypt).subs(z, zpt).evalf()

return float(gradx), float(grady), float(gradz)

and plot it so:

gradient where f=0 and x=0 |

Unsurprisingly, the gradient vectors are perpendicular to the surface.

Now we add the vectors corresponding to B

def eval_p_at(xpt, ypt, zpt):

co_ords = [xpt, ypt, zpt]

hessian_fn = [[f.diff(a).diff(b) for a in [x, y, z]] for b in [x, y, z]]

values = [sub_vals(item, co_ords) for item in hessian_fn]

hess_mat_eval = Matrix(values)

div_vals = sub_vals(div, co_ords)

if hess_mat_eval.det() == 0:

print "uninvertible hessian at ", xpt, ypt, zpt

return 0, 0, 0

else:

p = - (hess_mat_eval.inv() * Matrix(div_vals))

return float(p[0]), float(p[1]), float(p[2])

def sub_vals(items, vals):

return [i.subs(x, vals[0]).subs(y, vals[1]).subs(z, vals[2]).evalf() for i in items]

where this is the equation for

Now we add the vectors corresponding to B

_{k}^{-1}∇ f(**x**_{k}) along this line of x=0. Again, Sympy does all the heavy lifting:def eval_p_at(xpt, ypt, zpt):

co_ords = [xpt, ypt, zpt]

hessian_fn = [[f.diff(a).diff(b) for a in [x, y, z]] for b in [x, y, z]]

values = [sub_vals(item, co_ords) for item in hessian_fn]

hess_mat_eval = Matrix(values)

div_vals = sub_vals(div, co_ords)

if hess_mat_eval.det() == 0:

print "uninvertible hessian at ", xpt, ypt, zpt

return 0, 0, 0

else:

p = - (hess_mat_eval.inv() * Matrix(div_vals))

return float(p[0]), float(p[1]), float(p[2])

def sub_vals(items, vals):

return [i.subs(x, vals[0]).subs(y, vals[1]).subs(z, vals[2]).evalf() for i in items]

where this is the equation for

**p**we see at the top of this post. Plotting it gives:Gradient and p along f=0, x=0 |

Note how the

**p**vectors are all pointing downwards, trying to minimize f.
Now, if we look at these vectors of

**p**at varies points on this arbitrarily chosen surface, f=0, it looks like this:p vectors on f=0 |

which is starting to look like our heatmap if we looked directly down on it. The

**p**vectors are trying to find the "coldest" spot, the minimum of the function f.I'll flesh out the algorithm more in the next post.

## No comments:

## Post a Comment