Friday, January 15, 2021

Neural Nets and Anomaly Detection pt 1


More playing with anomaly detection, this time with Keras. All the data is synthetic and can be generated from the code on my GitHub here.

The Data

Our data looks like this:

Typical synthetic data

The top row shows noisy data that have a clear global periodicity. The bottow row shows noisy data with each column having a periodicity independent of the other columns.

Can a Variational Auto Encoder tell the difference?

The Model

The code for our neural net was adapted from this StackOverflow answer here. Very quickly (roughly 10 epochs), the VAE was able to differentiate which group a given square was in with an accuracy of about 90%.

The data projected onto the VAE's bottleneck

Note that I am not going to tune the VAE to get the last drop of accuracy. I'm happy with about 90%.

Note that the trick with VAEs is that the data with which we train it is also the data that we expect. This is simply because we're asking the neural net to reproduce what it's been given. It won't be exact at all since we're deliberately compressing the data through a bottleneck. This is the very definition of an auto encoder. The 'variational' term comes from the fact that the neural net learns the probabilities of the distribution representing the data [Quora].

The Results

The trouble with VAEs is that although you might be able to print a pretty graph of the rendered data, you might need some algorithm to differentiate all the data for you. I'm going to use SciKitLearn to run a KMeans algorithm since I know there are exactly 2 groups (that is, k=2 here).

This is fine in this example where I have two groups of 256 elements. What if we're faced with imbalanced data and are looking for outliers? KMeans does not do so well with that. It doggedly tries to find two clusters of roughly equal size giving an accuracy of about 50% - the monkey score.

Spot the outlier

To this end, we can use the DBScan algorithm, again from SKLearn. This finds my outlier but does take a bit of tuning.

Still playing...


No comments:

Post a Comment