Agile Java Man: Probabilistic Programming

In part 1, I looked at the first chapter of Bayesian Methods for Hackers. In this post, I'll conduct the same analysis on the same data but instead of using Python I'll be using Scala and Figaro.

To recap: we have somebody's text usage and we want to know when (if at all) there was a change in usage. All we have is the number of texts sent each day over a 74 day period.

We suppose that the distribution of texts per day is a Poisson distribution as we assume the text messages are independent of each other, it's a discrete distribution and it's got the nice property that the one constant it depends on (λ) is also its mean.

Unfortunately, we don't know what this parameter is so we need to model that stochastically too. This λ parameter is not discrete but continuous, so we choose the exponential distribution to model as it too has nice properties. It too depends on a constant parameter (α) and it has the nice property that its mean is α^-1.

Because of these nice properties the distributions have, we are helped in our initial guess at what their values are. In this case, we can take the mean number of texts (let's call it m) and then we know λ will (roughly) be that value. However, λ is not a fixed value, it's a stochastic variable modeled by an exponential function. That's OK though since we know that the mean of an exponential distribution is α^-1then α=m^-1.

Finally, we are assuming that on some day indicated by a value tau, we switch from one Poisson distribution to another so we need two of everything. Regarding the s, "because they’re parameters of a parameter, they’re sometimes called hyper-parameters." [1]

OK, enough of the maths, here's the code:

val data: List[Int] = // load the data from a file

val mean = data.sum.toDouble / data.length.toDouble
val alpha = 1.0 / mean

val lambda1: Element[Double] = Exponential(alpha)
val lambda2: Element[Double] = Exponential(alpha)

val poisson1 = Poisson(lambda1)
val poisson2 = Poisson(lambda2)

We don't know the initial value of tau so we guess it's initially likely to be any value in the 74 day period. We express that in Figaro thus:

val tau: AtomicUniform[Int] = Uniform(0 to n_count_data: _*)

And then we build our model to describe the situation outlined in the first few paragraphs thus:

def pDay(day: Int): Element[Int] = If (tau.map(value => value > day), poisson1, poisson2)

val modelData: Seq[Element[Int]] = (1 to n_count_data).map(d => pDay(d))

Finally, we tell the model what actual data is to help it know what it's shooting for:

modelData.zip(data).foreach { case(test, real) =>
test.observe(real) // WARNING! This does not play nicely with MH (see below)
}

Then we hand it off to an algorithm that solves the equation for us, Metropolis-Hastings.

val mh = MetropolisHastings(numSamples = 40000,
ProposalScheme(tau, lambda1, lambda2),
burnIn = 10000,
tau)
mh.start()

Oops. It gets stuck. Even if I reduce the number of samples from 40 000 to 10. Hmm. Well, it prompted me to ask Avi Pfeffer, the creator of Figaro, what the problem was.

"You have 70+ variables representing different days with 70+ possible values for each variable. When you observe all of these, you get an extremely unlikely observation. I think MH is struggling with finding any state that causes the observations to be satisfied."

That is, the chances of getting the actual data given the model we have built is staggeringly small. Annoyingly, I already knew this as I had read Dr Pfeffer's book where he built a probabilistic model of movie actors winning awards. Naturally, only one person can win the Oscar for best actor, so the model gets stuck as MH cannot explore any other actor winning the Oscar as it's a two step process to get there: before we select a new actor the original actor is no longer the winner (zero actors win the award - that's impossible) or we choose another actor who wins the award but before we say the original is not the winner (two actors win the award - also impossible).

There are two solutions. "The first is to use proposal schemes that are different from the default scheme. These custom schemes can, for example, propose to change two elements at once. The second idea is to avoid hard conditions that can never be violated." [1]

I chose the latter and wrote this:

def fitnessFnPowered(real: Int, power: Double): (Int) => Double = x => 1.0 / pow((abs(real - x) + 1), power)

That is, we calculate a value in the range [0,1] where 1 indicates a direct hit and anything else is less than this depending on how much off target it was. How quickly wrong proposals deviate from 1.0 is determined by the value of power.

modelData.zip(data).foreach { case(test, real) =>
test.addConstraint(fitnessFnPowered(real, ???))
}

And now my algorithm runs to completion. Well, it does if I give it a power rather than ???. What is a good value? By trial and error, I got:

And the probability became 1.0 (ie, certain) when the power value was set to 5.0. Although the distributions differ to the BM4H book, both frameworks agree on the most probable value for the day the usage change: day 45.

[1] Practical Probabilistic Programming, Avi Pfeffer.

Agile Java Man

Saturday, November 19, 2016

Probabilistic Programming - Part 2

No comments:

Post a Comment

Blog Archive

About Me