## Monday, July 18, 2016

### The theory that dare not speak its name

 From XKCD (https://xkcd.com/1132/)
The author of the excellent XKCD cartoons misrepresents the arguments of Frequentists and Bayesians for the purpose of jocularity but found he'd "kicked a hornet's nest" by doing so.

Bayes Theorem has been controversial for hundreds of years. In fact, books have been written on it (see here for a recent and popular one). One reason is that it depends on subjectivity. But Allen B Downey, author of Think Bayes (freely downloadable from here), thinks this is a good thing. "Science is always based on modeling decisions, and modeling decisions are always subjective.  Bayesian methods make these decisions explicit, and that’s a feature, not a bug. But all hope for objectivity is not lost.  Even if you and I start with different priors, if we see enough data (and agree on how to interpret it) our beliefs will converge.  And if we don’t have enough data to converge, we’ll be able to quantify how much uncertainty remains." [1]

Bayesian Networks

Taking the examples from Probabilistic Graphical Models available at Coursera, imagine this scenario: a student wants a letter of recommendation which depends on their grades, intelligence, SAT scores and course difficulty like so:

The course uses SamIam but I'm going to use Figaro as it's written in Scala. Representing these probability dependencies then looks like this:

class ProbabilisticGraphicalModels {

implicit val universe = Universe.createNew()

val Easy          = "Easy"
val Hard          = "Hard"
val Dumb          = "Dumb"
val Smart         = "Smart"
val A             = "A"
val B             = "B"
val C             = "C"
val GoodSat       = "GoodSat"
val BadSat        = "BadSat"
val Letter        = "Letter"
val NoLetter      = "NoLetter"

def chancesOfDifficultIs(d: Double): Chain[Boolean, String]
= Chain(Flip(d), (b: Boolean) => if (b) Constant(Hard) else Constant(Easy))

def chancesOfSmartIs(d: Double): Chain[Boolean, String]
= Chain(Flip(d), (b: Boolean) => if (b) Constant(Smart) else Constant(Dumb))

def gradeDistributionWhen(intelligence: Chain[Boolean, String] = defaultIntelligence,
difficulty:   Chain[Boolean, String] = defaultDifficulty): CPD2[String, String, String]
= CPD(intelligence, difficulty,
(Dumb, Easy)   -> Select(0.3   -> A,   0.4   -> B,   0.3   -> C),
(Dumb, Hard)   -> Select(0.05  -> A,   0.25  -> B,   0.7   -> C),
(Smart, Easy)  -> Select(0.9   -> A,   0.08  -> B,   0.02  -> C),
(Smart, Hard)  -> Select(0.5   -> A,   0.3   -> B,   0.2   -> C)
)

def satDist(intelligence: Chain[Boolean, String] = defaultIntelligence): CPD1[String, String]
= CPD(intelligence,
Dumb  -> Select(0.95  -> BadSat,  0.05  -> GoodSat),
Smart -> Select(0.2   -> BadSat,  0.8   -> GoodSat)
)

def letterDist(gradeDist: CPD2[String, String, String] = defaultGradeDist): CPD1[String, String]
= CPD(gradeDist,
A -> Select(0.1   -> NoLetter,  0.9   -> Letter),
B -> Select(0.4   -> NoLetter,  0.6   -> Letter),
C -> Select(0.99  -> NoLetter,  0.01  -> Letter)
)
.
.

And we can query it like so:

.
.
def probabilityOf[T](target: Element[T], fn: (T) => Boolean): Double = {
val ve = VariableElimination(target)
ve.start()
ve.probability(target, fn)
}
.
.

Finally, let's add some sugar so I can use it in ScalaTests:

.
.
val defaultDifficulty   = chancesOfDifficultIs(0.6)
val easier              = chancesOfDifficultIs(0.5)
val defaultIntelligence = chancesOfSmartIs(0.7)
val defaultGradeDist    = gradeDistributionWhen(intelligence = defaultIntelligence, difficulty = defaultDifficulty)
val defaultLetterDist   = letterDist(defaultGradeDist)
val defaultSatDist      = satDist(defaultIntelligence)

def being(x: String): (String) => Boolean = _ == x

def whenGettingAnABecomesHarder(): Unit = defaultGradeDist.addConstraint(x => if (x == A) 0.1 else 0.9)

def whenTheCourseIsMoreLikelyToBeHard(): Unit = defaultDifficulty.addConstraint(x => if (x == Hard) 0.99 else 0.01)

def whenLetterBecomesLessLikely(): Unit = defaultLetterDist.addConstraint(x => if (x == Letter) 0.1 else 0.9)

def whenTheSatIsKnownToBeGood(): Unit = defaultSatDist.observe(GoodSat)

}

Flow of Probabilistic Influence

If we wanted to know the chances of receiving a letter of recommendation, we'd just have to run probabilityOf(letterDist, being(Letter))  (which equals about  0.603656). The interesting thing is what happens if we observe some facts - does this change the outcome?

The answer is: it depends.

If we observe the SAT score for an individual, then their probability of receiving the letter changes. For example, executing satDist.observe(GoodSat) means that the chancesOf method now returns a probability of about 0.712 (and similarly a BadSat reduces the probability to about 0.457).

The general idea is given in this screen grab:

 From Daphne Koller's Probabilistic Graphical Models

The X-> Y and X <- Y are easy. "In general, probabilistic influence is symmetrical. That is, if X can influence Y, Y can influence X." [2]

Probabilities can flow through nodes in this Bayesian network so X -> W -> Y and X <- W <- Y are not terribly surprising if we know nothing about the intermediate step (the column on the left hand side). Conversely, if we do know something about it, that's all we need and it doesn't matter what the first step does.

The V-Structure (X <- W -> Y) is more interesting. Here, we can use ScalaTest to demonstrate what's going on. The first case is when we have no observed evidence. Here, Koller tells us

"If I tell you that a student took a class and the class is difficult, does that tell you anything about the student's intelligence? And the answer is 'no'."
And sure enough, Figaro agrees:

"probabilities" should {
"flow X -> W <- Y ('V-structure')" in new ProbabilisticGraphicalModels {
val smartBefore = probabilityOf(defaultIntelligence, being(Smart))
whenTheCourseIsMoreLikelyToBeHard()
val smartAfter  = probabilityOf(defaultIntelligence, being(Smart))
smartBefore shouldEqual smartAfter
}

with the caveat that this is true if and only if
"... W and all of its descendants are not observed."
if it is observed, then the following is true:

"flow X -> W <- Y ('V-structure')" in new ProbabilisticGraphicalModels {
// Can difficulty influence intelligence via letter?
val smartBefore = probabilityOf(defaultIntelligence, being(Smart))
whenLetterBecomesLessLikely() // "this too activates the V-structure"
val smartAfter  = probabilityOf(defaultIntelligence, being(Smart))
smartBefore should be > smartAfter
}

As mentioned, if we do know something about the intermediate step, the first step can't influence the outcome. Take these examples:
"I know the student got an A in the class, now I'm telling you that the class is really hard.does that change the probability of the distribution of the letter? No because ... the letter only depends on the grade."
"Given evidence about Z" should {

"make no difference in (X -> W -> Y)" in new ProbabilisticGraphicalModels {
defaultGradeDist.observe(A)
val letterChanceBefore  = probabilityOf(defaultLetterDist, being(Letter))
whenTheCourseIsMoreLikelyToBeHard()
val letterChanceAfter  = probabilityOf(defaultLetterDist, being(Letter))
letterChanceBefore shouldEqual letterChanceAfter
}
"If I tell you that the student is intelligent, then there is no way the SAT can influence the probability influence in grade."
"make no difference in X <- W -> Y" in new ProbabilisticGraphicalModels {
defaultIntelligence.observe(Smart)
val chancesOfABefore = probabilityOf(defaultGradeDist, being(A))
whenTheSatIsKnownToBeGood()
val chancesOfAAfter = probabilityOf(defaultGradeDist, being(A))
chancesOfAAfter shouldEqual chancesOfABefore
}

More notes to come.

[1] Prof Allen B Downey's blog
[2] Daphne Koller, Coursera.