Friday, December 27, 2019

CNNs and network scanning with TensorFlow


Introduction

Port scanning leaves tell-tale signs in flow logs that are easy to spot when we visualize them (here is an example). Wouldn't it be good if we could teach a neural net to check our logs for such activity?

To this end, I've written some Python code using Tensorflow (here) that fakes some port data. Some of the fakes are clearly the result of a port scan, others are just random connections. Can a convolutional neural network (CNN) spot the difference?

Fake data look like this:

Graphical representation of network connection activity
Notice that port scanning is represented by having a straight line where vertical lines might represent scanning all hosts on a single box or horizontal lines might represent scanning a lot of boxes on just one port.

Realistically, nefarious actors only scan a subset of ports (typically those below 1024). I'll make the fake data more sophisticated later.

A brief introduction to CNNs

"The big idea behind convolutional neural networks is that a local understanding of an image is good enough. As a consequence, the practical benefit is that fewer parameters greatly improve the time it takes to learn as well as lessens the amount of data required to train the model." [2]

"Pooling layers are commonly inserted between successive convolutional layers. We want to follow convolutional layers with pooling layers to progressively reduce the spatial size (width and height) of the data representation. Pooling layers reduce the data representation progressively over the network and help control overfitting. The pooling layer operates independently on every depth slice of the output." [1]

"The pooling layer uses the max() operation to resize the input data spatially (width, height). This operation is referred to as max pooling. With a 2x2 filter size, the max() operation is taking the largest of four numbers in teh filter area." [1]

Code

The code I stole from Machine Learning with Tensorflow but while it reshapes its data to 24x24 images, my images are of a different dimension. And when I took the code in the book, I got errors. Apparently, this line (p184) was causing me problems:

Where was it getting 6*6*64 from? The 64 is easy to explain (it's an arbitrary number of convolutions we use in the previous layer) but the 6x6...?

When using MLwTF's code and my data, Tensorflow was complaining that logits_size and labels_size were not the same. What does this mean?
Logits is an overloaded term which can mean many different things. In Math, Logit is a function that maps probabilities ([0, 1]) to R ((-inf, inf)) ... Probability of 0.5 corresponds to a logit of 0. Negative logit correspond to probabilities less than 0.5, positive to > 0.5. 
In ML, it can be the vector of raw (non-normalized) predictions that a classification model generates, which is ordinarily then passed to a normalization function. If the model is solving a multi-class classification problem, logits typically become an input to the softmax function. The softmax function then generates a vector of (normalized) probabilities with one value for each possible class.
(from StackOverflow).

After a lot of trial and error, I figured the 6*6 was coming from max pooling. This is where the algorithm"sweeps a window across an image and picks the pixel with the maximum value" [MLwTF]. The ksize in the code is 2 and this is the third layer so we have max-pooled the original matrix twice already. So, the 6 comes from our original size (24 pixels) twice max-pooled by a size of 2 giving 24/(2*2) = 6.

I noticed that the stride length also plays a part in the calculation of the size of the fully connected layer ("for example, a stride length of 2 means the 5 × 5 sliding window moves by 2 pixels at a time until it spans the entire image" MLwTF). In the example in MLwTF, the size is 1 so it makes no difference in this particular case but in general, we also need to divide our image size on each layer by this value.

So, the calculation of the size of the fully connected layer looks like (in Python):

        for _ in range(depth):
            i = math.ceil(i / self.stride_size)
            i = math.ceil(i / self.ksize)
            j = math.ceil(j / self.stride_size)
            j = math.ceil(j / self.ksize)
        return i * j

where i and j are originally set to be the size of the input data and depth is the layer for which we want to calculate the size.

[1] Deep Learning - a practitioners guide.
[2]  Machine Learning with TensorFlow

Sunday, December 1, 2019

Monad Transformers: code smell in disguise?


What are they?

Luka Jacobowitz (Gitter @LukaJCB 01/04/20 01:48) says "the definition of monad transformer is that for any MT[F, A] if F is a monad than MT[F, *] is also a monad."

Noel  Welsh gives us an example of a 3-way monad: None, Some(x) or a (String) error:
Monad transformers allow us to stack monads. Eg, stacking an Option with \/ (that is, Scalaz's version of Either) into one handy monad. The type OptionT[M[_], A] is a monad transformer that constructs an Option[A] inside the monad M. So the first important point is the monad transformers are built from the inside out.
Here's an example:

  def transformAndAdd[F[_]: Monad](fa: OptionT[F, Int], fb: OptionT[F, Int]): OptionT[F, Int] = for {
    a <- fa
    b <- fb
  } yield a + b


Here, F can be any monad. For instance:

    val faIO = OptionT[IO, Int](IO(Some(1)))
    val fbIO = OptionT[IO, Int](IO(Some(2)))

    transformAndAdd(faIO, fbIO)

will wrap the result of the addition in an IO. You can do something similar for Future:

    val fa = OptionT[Future, Int](Future(Some(1)))
    val fb = OptionT[Future, Int](Future(Some(2)))

    println(Await.result(transformAndAdd(fa, fb).value, 1.seconds))

We care not what the outer monad is as long as the inner is an Option in this case.

Drawbacks

Not all combinations of monads can be used with transformers. This SO answer explains why:
Important thing here is that there is no FutureT defined in cats, so you can compose Future[Option[T]], but can't do that with Option[Future[T]] ... 
That means you can compose Future x Option (aka Option[Future[T]]) to one single monad (coz OptionT exists), but you can't compose Option x Future (aka Future[Option[T]]) without knowing that Future is something else besides being a monad (even though they’re inherently applicative functors - applicative is not enough to neither build a monad nor monad transformer on it)
The omniscient Rob Norris explains why:
Rob Norris @tpolecat
Like Option[IO. There's no IOT transformer. 
PhillHenry @PhillHenry
Because nobody has written one? Or it can't be done?
(Sorry for such a silly question. Friday night here.) 
Rob Norris @tpolecat
It can't be done. You would have to define Option[IO[A]] => (A => Option[IO[B]]) => Option[IO[B]] but there's no way to do that because you can't get the A out.

Should we even use them?

There appear to be two schools of thought on Monad Transformers. People aligned to Scalaz appear to hate them in Scala.
jrciii @jrciii
Are monad transformer stacks idiomatic scalaz? If not is there a better way to get either an error or a value with logging along the way? 
beezee @beezee
eh idiomatic i have to defer to others. i feel like the idioms change every time there's a conference
i used them a lot and now consider that code legacy
i dream of the day it's all gone
it seems very in fashion to mimic mtl
scalaz has MonadError and MonadWriter
the nice thing about it is you don't worry about plumbing in your business code and you can be explicit about what capabilities are required for every function
i'm sure others who have used it more than me could describe pitfalls there too 
John A. De Goes @jdegoes
Monad transformers are not idiomatic in Scala. Avoid them at all costs. [See John's article here]

Emily Pillmore
The real question you need to ask after deciding Monad Transformers are not good in Scala is why that is true. The answer is not that MT are bad, per se, but that Scala has very poor facilities for dealing with nested boxed and unboxed values alike. It simply is not a language that has reasonable data formats 
John A. De Goes @jdegoes
@PhillHenry Yeah, what Emily says. Monad transformers work well in Haskell (actually way more performant than, e.g. free monads). But Scala is not Haskell. In Scala, monad transformers destroy performance and type inference, and add tedious boilerplate unless used in combination with type classes (in which case you just have to worry about type inference and performance).

[Gitter chat, 20 & 26 Feb  2019]

But the Cats crowd don't seem to mind:
Rob Norris @tpolecat
The correct take on monad transformers is that they're fine. Write your program in terms of abstract effect F and then at the end of the world if you have to instantiate it into a transformer stack, who cares? It's awkward but just for a line or two. 
Gavin Bisesi @Daenyth
And unless your app's runtime does no database or network IO, any performance overhead from using transformers is just not worth caring about. IO drowns it out by orders of magnitude

[Gitter chat, 1 Nov 2019]

But even the Cats people say: "Strong claim: Thou shalt not use Monad Transformers in your interface" (Practical FP in Scala by Gabriel Volpe)