Sunday, June 18, 2017

Entropy


In my physics degree, we were asked to calculate the entropy of a chess board. Students smarter than me snorted at this silly exercise. Yet, entropy is just a statistical measure. You cannot measure it directly (there are no entropy-ometers) but it exists everywhere and can be used in odd places like machine learning (eg maximum entropy classifiers which "from all the models that fit our training data, selects the one which has the largest entropy").

Alternatively, you might want to find the configuration with the smallest entropy. An example is here where the quality of a clustering algorithm (k-means in this case) is assessed by looking at the entropy of the detected clusters. "As an external criteria, entropy uses external information — class labels in this case. Indeed, entropy measures the purity of the clusters with respect to the given class labels. Thus, if every cluster consists of objects with only a single class label, the entropy is 0. However, as the class labels of objects in a cluster become more varied, the entropy value increases."

For instance, say you are trying to find the parameters for an equation such that it best fits the data. "At the very least, you need to provide ... a score for each candidate parameter it tries. This score assignment is commonly called a cost function. The higher the cost, the worse the model parameter will be... The cost function is derived from the principle of maximum entropy." [1]

What is Entropy

I found this description of heads (H) and tails (T) from tossing a coin enlightening:

"If the frequency [f] of H is 0.5 and f(T) is 0.5, the entropy E, in bits per toss, is

-0.5 log2 0.5

for heads, and a similar value for tails. The values add up (in this case) to 1.0. The intuitive meaning of 1.0 (the Shannon entropy) is that a single coin toss conveys 1.0 bit of information.

Contrast this with the situation that prevails when using a "weighted" or unfair penny that lands heads-up 70% of the time. We know intuitively that tossing such a coin will produce less information because we can predict the outcome (heads), to a degree. Something that's predictable is uninformative. Shannon's equation gives

-0.7 log2 (0.7) = 0.3602

for heads and

-0.3 log2  (0.3) = 0.5211

for tails, for an entropy of 0.8813 bits per toss. In this case we can say that a toss is 11.87% [1.0 - 0.8813] redundant."

Here's another example:

X = 'a' with probability 0.5
    'b' with probability 0.25
    'c' with probability 0.125
    'd' with probability 0.125

The entropy of this configuration is:

H(X) = -0.5 log(0.5) - 0.25 log(0.25) - 0.125 log(0.125) - 0.125 log(0.125) = 1.75 bits

What does this actually mean? Well, if the average number of questions asked ("Is X 'a'? If not, is it 'b'? ...") then "the resulting expected number of binary questions required is 1.75" [2].

Derivation of entropy

Basically, we want entropy to be extensive. That is "parameters that scale with the system. In other words U(aS,aV,aN)=aU(S,V,N)".

So, if SX is the entropy of system X, then the combined entropy of two systems, A and B, would be:

SC = SA + SB

Second, we want it to be largest when all the states are equally probably. Let's call the function f then the average value is:

S = <f> = Σipif(pi)               Equation 1

Now, given two sub-systems, A and B, the system they make up C will have entropy:

SC = ΣiΣjpipjf(pi)f(pj)            Equation 2

that is, the we are summing probabilities over the states where A is in state i and B is in state j.

For equation 2 to conform to the form of equation 1, let's introduce the variable pij=pipj.Then:

SC = ΣiΣjpijf(pij)

For this to be true, f = C ln p since ln(ab) = ln(a) + ln(b).

This is the argument found here.

Uses of entropy

Uses are plentiful but here are a couple that caught my eye.

The Count Bayesie blog shows that you can show how different distributions differ from each other by using the entropy. The analogy he uses is sending a distribution of teeth a species of alien has back home when you have limited bandwidth.

While looking at Spark's decision tree algorithm you see that each split is trying to increase the entropy.

[1] Machine Learning with Tensor Flow.
[2] Elements of Information Theory.

Sunday, June 11, 2017

Scala Equality


The code:

Set(1) == List(1)

will always return false but will also always compile. This is pathological.

"Equality in Scala is a mess ... because the language designers decided Java interoperability trumped doing the reasonable thing in this case." (from here).

There are ways of solving this problem, the simplest being to use === that a number of libraries offer. Here are a three different ways of doing it:

  def scalazTripleEquals(): Unit = {
    import scalaz._
    import Scalaz._
//    println(List(1) === List("1")) // doesn't compile :)
  }

  def scalaUtilsTripleEquals(): Unit = {
    import org.scalautils.TypeCheckedTripleEquals._
//    println(List(1) === List("1")) // doesn't compile :)
  }

  def scalacticTripleEquals(): Unit = {
    import org.scalactic._
    import TypeCheckedTripleEquals._
//    println(List(1) === (List("1"))) // doesn't compile :)
  }

But what about incorporating it into the Scala language itself?

Scala creator, Martin Odersky's views can be found here here. He is proposing "it is opt-in. To get safe checking, developers have to annotate with @equalityClass ... So this means we still keep universal equality as it is in Scala now - we don’t have a choice here anyway, because of backwards compatibility."

Warning: ScalaTest

There is a horrible gotcha using Scalactic and ScalaTest (which is odd since they are stable mates). The problem is that you want compilation to fail for something likes this:

import org.scalatest.{FlatSpec, Matchers}

class MyTripleEqualsFlatSpec extends FlatSpec with Matchers {

  "triple equals" should "not compile" in {
    List(1)  should === (List("1"))
  }

}

Only it doesn't. It happily compiles! This is not what was expected at all given the code in scalacticTripleEquals() above. The solution can be found here. You must change the class signature to:

class MyTripleEqualsFlatSpec extends FlatSpec with Matchers with TypeCheckedTripleEquals {

for the compiler to detect this error.

Arrays

As an addendum, I just discovered Scala gives a nice way to compare arrays by value rather than by reference. It's:

a.deep == b.deep

(see here).

Saturday, June 10, 2017

Either Mither


Either has changed. A superficial Google might suggest that Either represents just that - either A or B.

"You cannot, at least not directly, use an Either instance like a collection, the way you are familiar with from Option and Try. This is because Either is designed to be unbiased.

"Try is success-biased: it offers you map, flatMap and other methods that all work under the assumption that the Try is a Success, and if that’s not the case, they effectively don’t do anything, returning the Failure as-is." (from here)

And: "if you use Either for error reporting, then it is true that you want it to be biased to one side, but that is only one usecase of many, and hardcoding a single special usecase into a general interface smells of bad design" from here.

But then you see this in the Scala documentation: "Either is right-biased, which means that Right is assumed to be the default case to operate on. If it is Left, operations like map, flatMap, ... return the Left value unchanged".

This apparent contradiction arises as Scala 2.12 changed Either. It has become biased.

Let's demonstrate using ScalaTest. First, we define an Either and some functions to act on it:

    type LeftType   = List[Int]
    type RightType  = Int
    type EitherType = Either[LeftType, RightType]
    val left        = Left(List[Int]())
    val right       = Right(1)

    val rightFn: (RightType) => RightType = _ + 1
    val leftFn:  (LeftType)  => LeftType  = _ :+ 1

Then, the test looks like:

    "Either" should {
      "be agnostic if left or right has been explicitly stated on an Either that's a Left" in {
        val either: EitherType = left
        either.left.map(leftFn) shouldEqual Left(List(1))
        either.right.map(rightFn) shouldEqual left // unchanged
      }
      "be agnostic if left or right has been explicitly stated on an Either that's a Right" in {
        val either: EitherType = right
        either.right.map(rightFn) shouldEqual Right(2)
        either.left.map(leftFn) shouldEqual right // unchanged
      }
.
.

So far, so good. But this following code is new in 2.12 (it won't compile in earlier versions):

      "ignore the unbiased side" in {
        val either: EitherType = left
        either.map(rightFn) shouldEqual left
      }
      "map the biased side" in {
        val either: EitherType = right
        either.map(rightFn) shouldEqual Right(2)
      }

Either's new monadic functions cannot take anything other than a function of type (RightType) => ...

The mnemonic is that if you're using this for error reporting, then Right is the right answer.